Screencast: Grabbing data from Wikipedia using Python

Mar 15, 2014 • Matt Hall

Continuing on from my multiple choice questions, I show how to grab some data from a Wikipedia article using Python’s built-in urllib2 library, then process it using the built-in re library. Seven lines of code, easily convertible to a function — which could make a nice exercise.

This is a short example, but not a toy — I do this sort of thing quite a bit. So I think it’s relevant and useful. There’s some background on web scraping responsibly, of course, but this example is very light and won’t hurt and servers.

As usual, I had some trouble suppressing my perfectionist tendencies. I had to do this in a large room, hence the cave-like acoustics. I made one tweak to the audio in iMovie — a loud and inexplicable lip-smack at the end. Unsuppressed satisfaction at hitting 2:55 mins, no doubt. And yeah, OK, I had to do it about 8 times because there’s always some typo somewhere. (I actually like the mis-steps in live demos — and I think they’re instructive — but there wasn’t really time for them here.)

Thanks for watching!