Overview

Teaching: 15 min
Exercises: 0 min
Questions
  • FIXME

Objectives
  • Write Python programs to download data sets using simple REST APIs.

A growing number of organizations make data sets available on the web in a style called REST, which stands for REpresentational State Transfer. The details (and ideology) aren’t important; what matters is that when REST is used, every data set is identified by a URL and can be accessed through a set of functions called an application programming interface (API).

For example, the World Bank’s Climate Data API provides data generated by 15 global circulation models. According to the API’s home page, the data sets containing yearly averages for various values are identified by URLs of the form:

http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/var/year/iso3.ext

where:

For example, if we want the average annual temperature in Canada as a CSV file, the URL is:

http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv

If we paste that URL into a browser, it displays:

year,data
1901,-7.67241907119751
1902,-7.862711429595947
1903,-7.910782814025879
...
2007,-6.819293975830078
2008,-7.2008957862854
2009,-6.997011661529541

Behind the Scenes

This particular data set might be stored in a file on the World Bank’s server, or that server might:

  1. Receive our URL.
  2. Break it into pieces.
  3. Extract the three key fields (the variable, the country code, and the desired format).
  4. Fetch the desired data from a database.
  5. Format the data as CSV.
  6. Send that to our browser.

As long as the World Bank doesn’t change its URLs, we don’t need to know which method it’s using and it can switch back and forth between them without breaking our programs.

If we only wanted to look at data for a couple of countries, we could just download those files one by one. But we want to compare data for many different pairs of countries, so we should write a program.

Python has a library called urllib2 for working with URLs. It is clumsy to use, though, so many people (including us) prefer a newer library called Requests. To install it, run the command:

$ pip install requests

Installing with Pip

Note that pip is a program in its own right, so the command above must be run in the shell, and not from within Python itself.

If Requests is not already installed, pip’s output is:

Downloading/unpacking requests
  Downloading requests-2.7.0-py2.py3-none-any.whl (470kB): 470kB downloaded
Installing collected packages: requests
Successfully installed requests
Cleaning up...

If it’s already present, the output will be:

Requirement already satisfied (use --upgrade to upgrade): requests in /Users/swc/anaconda/lib/python2.7/site-packages
Cleaning up...

Either way, we can now get the data we want like this:

import requests
url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv'
response = requests.get(url)
if response.status_code != 200:
    print('Failed to get data:', response.status_code)
else:
    print('First 100 characters of data are')
    print(response.text[:100])
First 100 characters of data are
year,data
1901,-7.67241907119751
1902,-7.862711429595947
1903,-7.910782814025879
1904,-8.15572929382

The first line imports the requests library. The second defines the URL for the data we want; we could just pass this URL as an argument to the requests.get call on the third line, but assigning it to a variable makes it easier to find.

requests.get actually gets our data. More specifically, it:

The server can return many different status codes; the most common are:

Code Name Meaning
200 OK The request has succeeded.
204 No Content The server has completed the request, but doesn’t need to return any data.
400 Bad Request The request is badly formatted.
401 Unauthorized The request requires authentication.
404 Not Found The requested resource could not be found.
408 Timeout The server gave up waiting for the client.
418 I’m a teapot No, really…
500 Internal Server Error An error occurred in the server.

200 (OK) is the one we want; if we get anything else, the response probably doesn’t contain actual data (though it might contain an error message).

Some People Don’t Follow the Rules

Unfortunately, some sites don’t return a meaningful status code. Instead, they return 200 for everything, then put an error message (if appropriate) in the text of the response. This works when the result is being displayed to a human being, but fails miserably when the “reader” is a program that can’t actually read.

Defining REST API

A REST API is: 1. A data format. 2. A way of accessing data via an URL. 3. Less work for the server. 4. Only accessable via Python libraries like Requests.

Get Data for Guatemala

Modify the little program above to fetch temperatures for Guatemala.

How Hot is Afghanistan?

Read the documentation for the Climate Data API, and then write URLs to find the annual average temperature for Afghanistan between 1980 and 1999.

Key Points