I was recently came across a very interesting experiment in Eran Avidan’s Master’s thesis. Regular readers will know of my interest in identifiers; while everybody agrees that identifier names have a significant impact on the effort needed to understand code, reliably measuring this impact has proven to be very difficult.
The experimental method looked like it would have some impact on subject performance, but I was not expecting a huge impact. Avidan’s advisor was Dror Feitelson, who kindly provided the experimental data, answered my questions and provided useful background information (Dror is also very interested in empirical work and provides a pdf of his book+data on workload modeling).
Avidan’s asked subjects to figure out what a particular method did, timing how long it took for them to work this out. In the control condition a subject saw the original method and in the experimental condition the method name was replaced by
xxx and local and parameter names were replaced by single letter identifiers. The hypothesis was that subjects would take longer for methods modified to use ‘random’ identifier names.
A wonderfully simple idea that does not involve a lot of experimental overhead and ought to be runnable under a wide variety of conditions, plus the difference in performance is very noticeable.
The think aloud protocol was used, i.e., subjects were asked to speak their thoughts as they processed the code. Having to do this will slow people down, but has the advantage of helping to ensure that a subject really does understand the code. An overall slower response time is not important because we are interested in differences in performance.
Each of the nine subjects sequentially processed six methods, with the methods randomly assigned as controls or experimental treatments (of which there were two, locals first and parameters first).
The procedure, when a subject saw a modified method was as follows: the subject was asked to explain the method’s purpose, once an answer was given either the local or parameter names were revealed and the subject had to again explain the method’s purpose, and when an answer was given the names of both locals and parameters was revealed and a final answer recorded. The time taken for the subject to give a correct answer was recorded.
summary output of a model fitted using a mixed-effects model is at the end of this post (code+data; original experimental materials). There are only enough measurements to have
subject as a random effect on the
treatment; no order of presentation data is available to look for learning effects.
Subjects took longer for modified methods. When parameters were revealed first, subjects were 268 seconds slower (on average), and when locals were revealed first 342 seconds slower (the standard deviation of the between subject differences was 187 and 253 seconds, respectively; less than the treatment effect, surprising, perhaps a consequence of information being progressively revealed helping the slower performers).
Why is subject performance less slow when parameter names are revealed first? My thoughts: parameter names (if well-chosen) provide clues about what incoming values represent, useful information for figuring out what a method does. Locals are somewhat self-referential in that they hold local information, often derived from parameters as initial values.
What other factors could impact subject performance?
The number of occurrences of each name in the body of the method provides an opportunity to deduce information; so I think time to figure out what the method does should less when there are many uses of locals/parameters, compared to when there are few.
The ability of subjects to recognize what the code does is also important, i.e., subject code reading experience.
There are lots of interesting possibilities that can be investigated using this low cost technique.
Linear mixed model fit by REML ['lmerMod']
Formula: response ~ func + treatment + (treatment | subject)
REML criterion at convergence: 537.8
Min 1Q Median 3Q Max
-1.34985 -0.56113 -0.05058 0.60747 2.15960
Groups Name Variance Std.Dev. Corr
subject (Intercept) 38748 196.8
treatmentlocals first 64163 253.3 -0.96
treatmentparameters first 34810 186.6 -1.00 0.95
Residual 43187 207.8
Number of obs: 46, groups: subject, 9
Estimate Std. Error t value
(Intercept) 799.0 110.2 7.248
funcindexOfAny -254.9 126.7 -2.011
funcrepeat -560.1 135.6 -4.132
funcreplaceChars -397.6 126.6 -3.140
funcreverse -466.7 123.5 -3.779
funcsubstringBetween -145.8 125.8 -1.159
treatmentlocals first 342.5 124.8 2.745
treatmentparameters first 267.8 106.0 2.525
Correlation of Fixed Effects:
(Intr) fncnOA fncrpt fncrpC fncrvr fncsbB trtmntlf
funcrepeat -0.490 0.613
fncrplcChrs -0.526 0.657 0.620
funcreverse -0.510 0.651 0.638 0.656
fncsbstrngB -0.523 0.655 0.607 0.655 0.648
trtmntlclsf -0.505 -0.167 -0.182 -0.160 -0.212 -0.128
trtmntprmtf -0.495 -0.184 -0.162 -0.184 -0.228 -0.213 0.673