In the late 1920s and early 1930s, William Dyer, Frank Pabodie, and Valentina Roerich led expeditions to the Pole of Inaccessibility in the South Pacific, and then onward to Antarctica. Two years ago, their expeditions were found in a storage locker at Miskatonic University. We have scanned and OCR the data they contain, and we now want to store that information in a way that will make search and analysis easy.
Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a database to explore the expeditions’ data.
- Unix shell plus SQLite3 or Firefox SQLite plugin.
|00:00||Selecting Data||How can I get data from a database?|
|00:15||Sorting and Removing Duplicates||
How can I sort a query’s results?
How can I remove duplicate values from a query’s results?
|00:35||Filtering||How can I select subsets of data?|
|00:55||Calculating New Values||How can I calculate new values on the fly?|
How do databases represent missing information?
What special handling does missing information require?
|01:35||Aggregation||How can I calculate sums, averages, and other summary values?|
|01:55||Combining Data||How can I combine data from multiple tables?|
|02:35||Data Hygiene||How should I format data in a database, and why?|
|03:05||Creating and Modifying Data||How can I create, modify, and delete tables and data?|
|03:30||Programming with Databases - Python||How can I access databases from programs written in Python?|
|04:05||Programming with Databases - R||How can I access databases from programs written in R?|
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.