[Benchmarking] data block caching technique

Mon Sep 6 04:58:03 EDT 2010

Hey all,

On Mon, 2010-09-06 at 09:38 +0200, Luc Donea wrote:
> 
> We think that this combination of unrealistic test conception and data
> block caching technique is unfair to other participants and will make
> their results looks bad, while they might perform as good or even
> better in a real world use-case.
> 
We tried to raise this issue early on by saying that all those in the
benchmarking effort really needed to agree on what kind of a setup we
were trying to mimic in the benchmark so that we could then build tests
which reasonably represented that setup. 

Because we did not do that work, it seems we have stumbled into an edge
case for which some servers are able to work only from main memory. When
we agreed to use the current tiny raster data set (compared to the 1.3Tb
full .ecw dataset for all of Spain), we realized that we would not be
benchmarking a real, industrial dataset. However, we did not know that
it would be just small enough that, coupled with repeated request sets,
some servers would be working from main memory.

> I think that every one should publish all 3 run results and guarantee
> that these have been measured just after server restarting. We would
> also like that the ones using such technique rerun their test after
> disabling it.

The question of how to resolve this situation is more difficult. 

We had a vote on which scripts to use, and the vote result was in favour
of switching. Seeing the results of the vote, our team started all our
runs with the newer scripts. 

However, the vote seems to have been totally ignored. I personally do
not like working through this voting process but would rather work
through the slower but more friendly and productive process of getting
everyone to agree on a consensus position. Nonetheless up until the
script vote, everything in this benchmarking process was done through
voting. I am puzzled as to why, on this issue, the vote was ignored.

The proposal you make, Luc, would be difficult to follow. I imagine few
of the teams bothered to flush the memory cache before making their
runs. I have observed Constellation-SDI both after filling the caches
and after emptying them---the results are, unsurprisingly, totally
different. So your proposal boils down to every team re-running a set of
benchmarks.

I am open to any productive suggestion of how to resolve the issue. We
could easily generate newer, shorter scripts, say with only one or two
thread combinations to test servers as if they were serving the whole
imagery dataset for Spain yet be able to complete runs for all the teams
in the remaining time. We might even be able to make the runs for all
servers during our meeting time on the 7th. It would seem generally a
good strategy anyhow to run the benchmarks all together so everyone can
follow what is happening.

--adrian custer