[Benchmarking] Proposal: add standard deviation to the graphical result

Wed Aug 3 12:36:11 EDT 2011

This makes sense to me. I like it.

Mike

-------
Michael Smith
Remote Sensing/GIS Center
US Army Corps of Engineers

From: benchmarking-bounces at lists.osgeo.org [mailto:benchmarking-bounces at lists.osgeo.org] On Behalf Of Martin Desruisseaux
Sent: Wednesday, August 03, 2011 12:22 PM
To: benchmarking-osgeo
Subject: [Benchmarking] Proposal: add standard deviation to the graphical result

Hello all

Giving that every time a team run a suite of tests, the execution time may vary slightly (maybe more in Java than in C/C++), we are supposed to run the same suite of tests many time in order to gain meaningful statistics. In his "Performance Anxiety" talk (http://www.devoxx.com/display/Devoxx2K10/Performance+Anxiety), Joshua Bloch suggests that 40 executions is a minimum. I'm somewhat neutral on the number of runs. However I would like that every team save the execution time of individual run, so we can do statistics. More specifically, I suggest that the curve to be show at FOSS4G be the average of all execution time (minus the first executions for Java applications, because of JVM "warm up" time), together with the standard deviation. The standard deviation was missing in the last year graphics.

I can take care of producing the graphics at the FOSS4G once I have the data (every spreadsheets have those basic statistics tools). The reason why I wish standard deviation is that:

  *   It show if the execution time of an application is rather stable, of vary a lot.
  *   If an application appears faster than an other one, the standard deviation tell us the probability that the first application is really faster, i.e. that the difference is not a matter of luck because of random variations in execution time. (This point assumes that the execution time have a gaussian distribution, but this is probably the case and we can very that from the raw data).

I can take care of the stats. I basically just ask that we agree on how many time each suite of tests shall be run, and that each team record all their raw data (execution time of each individual run).

    Regards,

    Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/benchmarking/attachments/20110803/9abbdc7c/attachment.html