Summary and Schedule
In the late 1920s and early 1930s, William Dyer, Frank Pabodie, and Valentina Roerich led expeditions to the Pole of Inaccessibility in the South Pacific, and then onward to Antarctica. Two years ago, their expeditions were found in a storage locker at Miskatonic University. We have scanned and OCR the data they contain, and we now want to store that information in a way that will make search and analysis easy.
Three common options for storage are text files, spreadsheets, and databases. Text files are easiest to create, and work well with version control, but then we would have to build search and analysis tools ourselves. Spreadsheets are good for doing simple analyses, but they don’t handle large or complex data sets well. Databases, however, include powerful tools for search and analysis, and can handle large, complex data sets. These lessons will show how to use a database to explore the expeditions’ data.
Prerequisites
- This lesson requires the Unix shell, plus SQLite3 or DB Browser for SQLite.
- Please download the database we will use: survey.db
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Selecting Data | How can I get data from a database? |
Duration: 00h 15m | 2. Sorting and Removing Duplicates |
How can I sort a query’s results? How can I remove duplicate values from a query’s results? |
Duration: 00h 35m | 3. Filtering | How can I select subsets of data? |
Duration: 00h 55m | 4. Calculating New Values | How can I calculate new values on the fly? |
Duration: 01h 05m | 5. Missing Data |
How do databases represent missing information? What special handling does missing information require? |
Duration: 01h 35m | 6. Aggregation | How can I calculate sums, averages, and other summary values? |
Duration: 01h 55m | 7. Combining Data | How can I combine data from multiple tables? |
Duration: 02h 35m | 8. Data Hygiene | How should I format data in a database, and why? |
Duration: 03h 05m | 9. Creating and Modifying Data | How can I create, modify, and delete tables and data? |
Duration: 03h 30m | 10. Programming with Databases - Python | How can I access databases from programs written in Python? |
Duration: 04h 05m | 11. Programming with Databases - R | How can I access databases from programs written in R? |
Duration: 04h 50m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Software
For this course you will need the UNIX shell (described in the UNIX Shell lesson), plus SQLite3 or DB Browser for SQLite.
If you are running macOS you should already have
SQLite installed. You can run sqlite3 --version
in a
terminal to confirm that it is available. You can also download DB
Browser for SQLite from their
website
If you are running Linux, you may already have
SQLite3 installed, please use the command which sqlite3
to
see the path of the program, otherwise you should be able to get it from
your package manager (on Debian/Ubuntu, you can use the command
apt install sqlite3
).
If you are running Windows, download installers and
run them as administrator. Make sure you select the right installer
version for your system. If the installer asks to add the path to the
environment variables, check yes, otherwise you have to manually add the
path of the executable to the PATH
environmental variables.
This path informs the system where to find the executable program.
If installing SQLite3 using Anaconda, refer to the anaconda sqlite docs.
After the installation and the setting of the paths, close the terminal and reopen a new terminal. This enables paths and configurations to be loaded.
Files
Please download the database we’ll be using: survey.db