What factors impact the comprehensibility of code? In this blog post, I'll describe an experiment I did with my advisors Andrew Lumsdaine (Computer Science) and Rob Goldstone (Cognitive Science) at Indiana University.
We asked 162 programmers to predict the output of 10 small Python programs. Each program had 2 or 3 different versions, and we used subtle differences between program versions to demonstrate that seemingly insignificant notational changes can have big effects on correctness and response times. I'll go over some of the results here, hopefully to whet your appetite for the paper.
When dealing with numeric matrices and vectors in Python, NumPy makes life a lot easier. For more complex data, however, it leaves a lot to be desired. If you're used to working with data frames in R, doing data analysis directly with NumPy feels like a step back.
Fortunately, some nice folks have written the Python Data Analysis Library (a.k.a. pandas).
Pandas provides an R-like
DataFrame, produces high quality plots with matplotlib, and integrates nicely with other libraries that expect NumPy arrays.
In this tutorial, we'll go through the basics of pandas using a year's worth of weather data from Weather Underground. Pandas has a lot of functionality, so we'll only be able to cover a small fraction of what you can do. Check out the (very readable) pandas docs if you want to learn more.
From time to time, I come across or come up with interesting ways to solve problems in Python. To avoid forgetting them, I plan to update this post as I add more recipes to my collection.
If you know of a better way to do something, let me know!
As my fellow Ph.D. student Eric Holk talked about recently in his blog, I've been running eye-tracking experiments with programmers of different experience levels. In the experiment, a programmer is tasked with predicting the output of 10 short Python programs. A Tobii TX300 eye tracker keeps track of their eyes at 300 Hz, allowing me to see where they're spending their time.
For this tutorial, we'll be plotting some weather data from a site call Weather Underground. You can download temperature readings and weather events for your local area in a comma-separated file.
I've put weather data for Bloomington, IN in a file called weather.csv. Each row is one day, and there are columns for min/mean/max temperature, dew point, wind speed, etc. We'll be plotting temperature and weather event data (e.g., rain, snow).
Let's say you have a text file called workout.csv that contains information about your workouts for the month of March:
# date, kind of workout, distance (miles), time (min) "2012, Mar-01", run, 2, 25 "2012, Mar-03", bike, 10, 55 "2012, Mar-06", bike, 5, 20 "2012, Mar-09", run, 3, 42 "2012, Mar-10", skateboarding, 2, 10 # Broke my leg :( "2012, Mar-11", Wii, 0, 60 "2012, Mar-12", Wii, 0, 60 "2012, Mar-13", Wii, 0, 60 "2012, Mar-14", Wii, 0, 60
It's a common-separated value (CSV) file, but contains comments and blank lines. The first line (a comment) describes the fields in this file, which are (from left to right) the date of your workout, the kind of workout, how many miles you traveled, and how many minutes you spent (note: I didn't actually break my leg, it's just an example!).
Our goal will be to read this data into Python and plot a graph with the day of the month on the x-axis and the time worked out on the y-axis. Let's get started.