<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7638.1">

<TITLE>RE : [Benchmarking] data block caching technique</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->


<P><FONT SIZE=2>Hi Adrian and all,<BR>

<BR>

Thanks for your quick answer Adrian. I like your idea to generate newer, shorter scripts.<BR>

<BR>

I would not limit to only two threads, I think it stay important to show the server scalability at 64 users. We could do 1, 4 and 64 users to stay short. The important thing to me is to to create a different csv for the 2nd and 3rd runs. This would ensure a more realistic result and still allow servers to warm up. I also think we should rerun the imagery but also the vector tests. Do you think you would be able to create these tests? I must admit that I am unable to do that... :-)<BR>

<BR>

It would be indeed really nice to run the benchmarks all together so everyone can follow what is happening, but I guess this can take quite awhile to run 16 tests and starting 8 servers. I guess it can easily turn to a 3 hours session.<BR>

<BR>

I'd like to know what the others are thinking about this and what would be everyone's availability on Tuesday?<BR>

<BR>

Thanks,<BR>

<BR>

Luc<BR>

<BR>

-------- Message d'origine--------<BR>

De: benchmarking-bounces@lists.osgeo.org de la part de Adrian Custer<BR>

Date: lun. 06/09/2010 04:58<BR>

À: benchmarking@lists.osgeo.org<BR>

Objet : Re: [Benchmarking] data block caching technique<BR>

<BR>

Hey all,<BR>

<BR>

<BR>

On Mon, 2010-09-06 at 09:38 +0200, Luc Donea wrote:<BR>

&gt;<BR>

&gt; We think that this combination of unrealistic test conception and data<BR>

&gt; block caching technique is unfair to other participants and will make<BR>

&gt; their results looks bad, while they might perform as good or even<BR>

&gt; better in a real world use-case.<BR>

&gt;<BR>

We tried to raise this issue early on by saying that all those in the<BR>

benchmarking effort really needed to agree on what kind of a setup we<BR>

were trying to mimic in the benchmark so that we could then build tests<BR>

which reasonably represented that setup.<BR>

<BR>

Because we did not do that work, it seems we have stumbled into an edge<BR>

case for which some servers are able to work only from main memory. When<BR>

we agreed to use the current tiny raster data set (compared to the 1.3Tb<BR>

full .ecw dataset for all of Spain), we realized that we would not be<BR>

benchmarking a real, industrial dataset. However, we did not know that<BR>

it would be just small enough that, coupled with repeated request sets,<BR>

some servers would be working from main memory.<BR>

<BR>

<BR>

&gt; I think that every one should publish all 3 run results and guarantee<BR>

&gt; that these have been measured just after server restarting. We would<BR>

&gt; also like that the ones using such technique rerun their test after<BR>

&gt; disabling it.<BR>

<BR>

The question of how to resolve this situation is more difficult.<BR>

<BR>

<BR>

We had a vote on which scripts to use, and the vote result was in favour<BR>

of switching. Seeing the results of the vote, our team started all our<BR>

runs with the newer scripts.<BR>

<BR>

However, the vote seems to have been totally ignored. I personally do<BR>

not like working through this voting process but would rather work<BR>

through the slower but more friendly and productive process of getting<BR>

everyone to agree on a consensus position. Nonetheless up until the<BR>

script vote, everything in this benchmarking process was done through<BR>

voting. I am puzzled as to why, on this issue, the vote was ignored.<BR>

<BR>

<BR>

<BR>

The proposal you make, Luc, would be difficult to follow. I imagine few<BR>

of the teams bothered to flush the memory cache before making their<BR>

runs. I have observed Constellation-SDI both after filling the caches<BR>

and after emptying them---the results are, unsurprisingly, totally<BR>

different. So your proposal boils down to every team re-running a set of<BR>

benchmarks.<BR>

<BR>

I am open to any productive suggestion of how to resolve the issue. We<BR>

could easily generate newer, shorter scripts, say with only one or two<BR>

thread combinations to test servers as if they were serving the whole<BR>

imagery dataset for Spain yet be able to complete runs for all the teams<BR>

in the remaining time. We might even be able to make the runs for all<BR>

servers during our meeting time on the 7th. It would seem generally a<BR>

good strategy anyhow to run the benchmarks all together so everyone can<BR>

follow what is happening.<BR>

<BR>

<BR>

--adrian custer<BR>

<BR>

<BR>

<BR>

<BR>

_______________________________________________<BR>

Benchmarking mailing list<BR>

Benchmarking@lists.osgeo.org<BR>

<A HREF="http://lists.osgeo.org/mailman/listinfo/benchmarking">http://lists.osgeo.org/mailman/listinfo/benchmarking</A><BR>

<BR>

</FONT>

</P>


</BODY>

</HTML>