[Benchmarking] Proposal: add standard deviation to the graphical result

Wed Aug 3 12:21:35 EDT 2011

Hello all

Giving that every time a team run a suite of tests, the execution time may vary 
slightly (maybe more in Java than in C/C++), we are supposed to run the same 
suite of tests many time in order to gain meaningful statistics. In his 
"/Performance Anxiety/" talk 
(http://www.devoxx.com/display/Devoxx2K10/Performance+Anxiety), Joshua Bloch 
suggests that 40 executions is a minimum. I'm somewhat neutral on the number of 
runs. However I would like that every team save the execution time of individual 
run, so we can do statistics. More specifically, I suggest that the curve to be 
show at FOSS4G be the average of all execution time (minus the first executions 
for Java applications, because of JVM "warm up" time), together with the 
standard deviation. The standard deviation was missing in the last year graphics.

I can take care of producing the graphics at the FOSS4G once I have the data 
(every spreadsheets have those basic statistics tools). The reason why I wish 
standard deviation is that:

  * It show if the execution time of an application is rather stable, of vary a lot.
  * If an application appears faster than an other one, the standard deviation
    tell us the probability that the first application is really faster, i.e.
    that the difference is not a matter of luck because of random variations in
    execution time. (This point assumes that the execution time have a gaussian
    distribution, but this is probably the case and we can very that from the
    raw data).

I can take care of the stats. I basically just ask that we agree on how many 
time each suite of tests shall be run, and that each team record all their raw 
data (execution time of each individual run).

     Regards,

     Martin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/benchmarking/attachments/20110803/e8fefb0e/attachment.html