Philip Guo (Phil Guo, Philip J. Guo, Philip Jia Guo, pgbovine)

Python data analysis screencast

Summary
Here is a 45-minute coding session where I used Python to analyze data for one of my research projects.

This set of videos is useful for someone who wants to learn how to use Python 2.7 and the standard Python library for basic data analysis.

I'm trying to get real work done here, so this isn't a canned demo for teaching purposes. The main change in my behavior was thinking out loud to narrate the video. I also worked a bit slower than usual, but I probably couldn't have done this task much faster even if I weren't being taped. (If I tried to go faster, then I would've introduced more bugs, which would've taken time to fix.)

Here we go!

Part 1 (Duration: 5:28)

Introducing the goal, data set, and parsing a csv file into a dict:

Part 2 (8:18)

Grouping and sorting the parsed data using a defaultdict of lists:

Part 3 (10:15)

More grouping, pairing up consecutive entries, and using a Counter to count occurrences:

Part 4 (10:00)

Running on multiple data files, testing performance, doing sanity checks, and using cPickle to cache intermediate results to avoid unnecessarily re-running slow code:

Part 5 (10:48)

Finalizing the analysis by calculating aggregate stats from the cached (pickled) data and printing out the final results:

Created: 2014-10-20
Last modified: 2014-12-18
Related pages tagged as programming:
Related pages tagged as research: