As my fellow Ph.D. student Eric Holk talked about recently in his blog, I've been running eye-tracking experiments with programmers of different experience levels. In the experiment, a programmer is tasked with predicting the output of 10 short Python programs. A Tobii TX300 eye tracker keeps track of their eyes at 300 Hz, allowing me to see where they're spending their time.
Eric's blog post has a video of him reading one of the longer programs in the study, and it's interesting to see how he differs from a novice reading a version of the same program:
When contrasting this with Eric's video, a few things stand out to me. First, Eric's eye movements are precise and directed from the beginning. He quickly finds the first print statement and jumps back to comprehend the between function. The novice, on the other hand, spends time skimming the whole program first before tackling the first print. This is in line with expectations, of course, but it's cool to see it come out in the data.
Another thing that stands out is the pronounced effect of learning in both videos. As Eric pointed out, it appears that he "compiled" the between function in his head, since his second encounter with it doesn't require a lengthy stop back at the definition. The novice received an inline version of the same program, where the functions were not present. Nevertheless, we can see a sharp transition in reading style around 1:30 when the pattern has been recognized. Rather than browsing through many different lines, the novice begins to focus heavily on just the relevant numbers. This change in style carries over to the final few lines, where she makes quick work of the inlined common function (and forgetting to remove the extraneous comma!).
Cool, huh?
In this last 30 seconds or so of the novice video above, you can see her back-and-forth comparison of the x and y lists. If you look carefully, however, the red dot (her gaze point) is often undershooting the numbers on both lists. Why is this? While it could be a miscalibration of the eye-tracker, the participant may also have been using her parafoveal (the region outside the fovea) to read the numbers. This and the fact that foveation and visual attention are not necessarily always the same (i.e., looking at something doesn't always mean you're thinking about it) encourages us to be cautious when interpreting eye-tracking data.
I'm interested in helping the Psychology of Programming research community develop a measurable notion of usability for programming languages. There are decades of research on code comprehension, but no real way of quantifying the difference in usability between something like single and multiple inheritance. To make progress, I believe we need to model what's going on in programmers' heads. While these kinds of models exist today, they're box and arrow diagrams; nothing you can execute with some code as input.
My dream is to develop a tool that incorporates a computation cognitive model of the programmer, and can "read" code in much the same way (or at least close enough to be useful). With such a tool, you could predict how complex a piece of code is by feeding it to the model and measuring aspects of the resulting "mental" representation. The model could have some initial set of long-term memory contents, representing the expertise of the programming or familiarity with the codebase. In the short term, this could replace simple structural complexity metrics, like lines of code and Cyclomatic Complexity. In the long term, the tool could help drive mainstream language design.
This research is exciting, but very preliminary. If you're interested in empirically-driven programming language design in the here-and-now, I suggest looking at Andreas Stefik's Quorum programming language.
If you're in the Bloomington, IN area and would like to participate in my experiment, send me an e-mail. Tim the Terminal will thank you!