Databases and SQL: Instructor Notes

database (dā’tə-bās’) noun: “A collection of data arranged for ease and speed of search and retrieval by a computer”

— The American Heritage® Science Dictionary


Relational databases are not as widely used in science as in business, but they are still a common way to store large data sets with complex structure. Even when the data itself isn’t in a database, the metadata could be: for example, meteorological data might be stored in files on disk, but data about when and where observations were made, data ranges, and so on could be in a database to make it easier for scientists to find what they want to.


SQLite Setup

In order to execute the following lessons interactively, please install SQLite as mentioned in the setup instructions for your workshop. Then:

$ git clone
$ cd sql-novice-survey

Next, create the database that will be used:

$ sqlite3 survey.sqlite .read bin/create-db.sql

This reads commands from bin/create-db.sql, which sets up the tables and loads data from the CSV files in the data directory.

To run commands interactively, run SQLite on survey.sqlite:

$ sqlite3 survey.sqlite
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.


The command history and line editing features provided by readline are invaluable with a command-line tool like sqlite3. Participants should be encouraged strongly to start with a simple SQL statement and then use the up-arrow key to go back and add clauses one at a time, or fix problems, rather than typing each command from scratch. Unfortunately on some Linux and Mac OS X systems participants have found that the arrow keys do not scroll through the command history as expected.

A workaround for this it to use the rlwrap (readline wrapper) command when starting SQLite:

$ rlwrap sqlite3 survey.sqlite

The rlwrap package is available in the standard Fedora repository (but wasn’t needed when I [@benwaugh] taught this) and appears to be available in Ubuntu too, and in OS X using Homebrew.