[Benchmarking] Ideas for next year

Wed Sep 21 12:44:39 EDT 2011

I definitely agree that the exercise should be changed for next year,
as the barrier of entry of this year's one was too high:

- rendering a complete map for all zoom levels put too much effort on
creating the mapfiles/slds, and not enough on actual rendering
improvements

- using the mapserver mapfile as a starting point was a difficult step
to overcome for the other teams, especially as that mapfile isn't
particularly legible (it is automatically generated, not hand
written). The sheer size of the mapfile inevitably caused translation
errors (e.g. the mapnik carto was hitting generalized tables at zoom
levels where the mapfile was using the full resolution ones, I've
published an updated chart comparing the results and showing that this
had a very visible impact on the published results:
https://docs.google.com/spreadsheet/oimg?key=0AnCiCdIXpTHidFdqQkRpQjEzOTdkRjVJWHNCSXByWGc&oid=1&zx=gynyn5i9yra1
.

- I suspect that the exercise wasn't even completely fair as their are
some time consuming concepts in that mapfile that probably can't be
translated to sld/carto, and vice-versa

- Although mapserver had a nice headstart by not having to create the
mapfile, our team was penalized by the fact that we had a very little
margin in ways to optimize the mapfile for performance, as all other
teams were replicating its functionality

- The bottleneck (at least for mapserver) were the database queries. I
know this is the opportunity to optimize the queries being sent, but
in final we didn't end up showing the results of the actual wms
servers.

A suggestion for next year would be to split the exercise in multiple parts:

- A rendering performance part for a scale independent map with 3
layers (point, line, polygon). The actual symbology used for those
layers could be rather complex, but I think it will be less
error-prone and easier to implement than this year's multi-level map.
The output of this exercise is not visually pleasing, it only assesses
the rendering performance of each solution. The actual requests sent
for this part could be for the the seeding of the area of interest at
a given number of zoom levels, and we could factor out the WMS part
for those teams that want to implement another generation scheme.

- A rendering quality test, where we define an objective output for a
given extent and image size, and each team must try to get as close as
possible visually. This one is harder to assess with hard numbers, but
the rendering time could be taken into account even if it is not the
objective of the test.

- Possibly a wms conformance test, if some teams want to set that up.

As for the confidence interval, I am not opposed but think it will be
difficult to set up without requesting exactly the same data over and
over again, and that will raise the same concerns of data caching as
last year.
With this year's exercise those confidence numbers would have been
meaningless anyways, as for a given run there were very different kind
of requests being renderered (e.g. from a 20x20 map with no features
to a 800x800 map of a densely featured area.

Best regards, and please comment and/or come up with other/more
improvements for next year,

Thomas

On Tue, Sep 20, 2011 at 15:39, thomas bonfort <thomas.bonfort at gmail.com> wrote:
> ---------- Forwarded message ----------
> From: Vincent Heurteaux <vincent.heurteaux at geomatys.fr>
> Date: Sat, Sep 17, 2011 at 18:16
> Subject: Re: [Benchmarking] Ideas for next year
> To: "Performance testing of OSGeo and other web service engines."
> <benchmarking at lists.osgeo.org>
>
>
> Ok, please excuse Johann's e-mail rudeness, after talking with him,
> you can do "sed 's/upset/desapointed/ johann's_email" and so on ...
>
> His frustration on this exercise is entirely my fault. Due to lack of
> time to spend on this game (this is a game IMHO), he had to work night
> and day in a really short time period to keep Constellation in the
> competition.
> Then I engage Myself to give more time to Martin and Johann next year
> to prepare the use cases, and play the game in the right conditions.
>
> Cheers,
>
> Vincent
>
> Le 17 sept. 2011 à 09:39, johann.sorel at geomatys.com a écrit :
>
>> Hi,
>>
>> Just my thinking : Last year event had much more competitors, this year it was a Mapnik vs Mapserver mainly.
>> So I understand you want to compare both projects.
>>
>> This year was my first participation as a developer for the Constellation server and I have been really upset most of the time.
>> I was hoping to work on our engine improvments but at the end I spend 70% of my time on a parser to convert
>> Mapfile to SLD. without this effort both Constellation and Geoserver would have been out of the bench.
>>
>> So Definitly next year if we intend to have more competitors (and not even less) there is a need to describe the objective in a neutral way for all teams, both styling and datas. I'm not saying it must be OGC SLD/SE, a text describing the expected result is enough, each team can then implement it with it's own style model.
>>
>> Talking about datas, only about 3 or 4 weeks before the bench was decided to use BIL files for pseudo-hillshading. since both mapserver and mapnik rely on gdal/ogr they had no problems but that's not the case for everyone. so I also hope last minutes change linked to data format will not happen in the futur.
>>
>> I also noticed those tests did not involve vector reprojections. after all we are providing Mapping servers not Painting servers. so reprojection should take more place in the tests. I think running queries in ten or more different projections would be nice.
>>
>> johann
>>
>>
>> On 17/09/2011 07:23, Iván Sánchez Ortega wrote:
>>> Hi all,
>>>
>>> During some beers at the Wynkoop, I had an idea that I think is worth sharing.
>>>
>>> Until now, the results focus on throughtput per number of concurrent requests.
>>> This is fine, but other metrics are possible.
>>>
>>> Then, I heard that Mapnik excels at requests with few vector features, while
>>> Mapserver does a very good job when there are many vector features to be
>>> rendered.
>>>
>>> You can guess where this goes. I will propose that, for next year (years?),
>>> requests should be classified into groups depending on the number of features
>>> contained in that extent. e.g. requests with<10 feats, 10-50, 50-100,
>>> 100-500,>500. Measure latency/throughput for every group, put the results in
>>> a graph.
>>>
>>>
>>>
>>> I don't know if this is feasible. Anyway, will see you tomorrow at the Code
>>> Sprint,
>>
>> _______________________________________________
>> Benchmarking mailing list
>> Benchmarking at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>
> _______________________________________________
> Benchmarking mailing list
> Benchmarking at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/benchmarking
>