[Benchmarking] meeting summary

Andrea Aime aaime at opengeo.org
Thu Jun 3 16:38:46 EDT 2010


Adrian Custer ha scritto:
> Hello all,
> 
> Yesterday's meeting was problematic at many levels. 

Premise before I start a long answer to a long mail: what I'm going to 
state in this mail is solely my point of view, I'm not trying to
give definitive directions. I hope others will join and share their
point of view as well.

> 
>>From my perspective, the pacing was way too agressive, trying to reach
> 'rules of engagement' without first agreeing on what the benchmarking
> effort is trying to do.

That was actually covered by the first statement by Jeff, which no one
disputed:

"we should state that this is a friendly exercise. 'winning' is for the 
users, each participant team will enhance their software and all 
enhancements will be contributed back to the software."

The shootout has been, so far, an occasion to compare open source
software performance in order to give reason, to each community,
to improve and participate the software.
That's the first and foremost goal.

The second element that made this presentation popular is... well..
that it's popular, also in the way it is conceived. The material
is meant to be presented in a plenary, with simple to understand
results, possibly with a lively presentation: it is not meant to
be material for an academic trac (more or this later).

> For example, there seems to be a notion that it makes sense for a 
>         "'baseline' test with the data in its raw format". 
> This apparently means that servers should be constrained in some way to
> use shapefiles directly, possibly forcing servers to read from the file
> for every request or something similar. I don't understand the exact
> constraint nor why we are mucking around at this level of detail.
> Working on the WMS and other standards at the OGC has trained me
> actively to avoid such dictates so it is hard for me to think in this
> way. Since most WMS servers allow users to use data in shapefile format,
> I am puzzled why the agreement would not instead be for all servers to
> use the shapefile data in the same way that they expect their users to
> use shapefile data by default. A server which forced all its users to
> put their shapefile data into a database would be excluded by the rule
> above so it seems that such a rule, because not generally applicable,
> probably does not make sense.

It actually makes a lot of sense, my working experience tells
me that a system forcing people into putting the data into a specific
format is doomed to be a niche one.
So far I've worked in three different place, with different programming 
languages, different implementation domains.
Only one thing remained constant: it is not you who chooses the data
storage format, it's the customer.
This has probably lot to do with the kind of customer I've been dealing
with: small to big public administration, medium to big private
companies.
All of them share one desire: to standardize IT so that it can be 
managed by a smaller number of people, and so that IT workers are
more easily replaceable.

 From that point of view a benchmark like last year one, comparing 4
popular data storage and forcing each server to connect to each, made
a lot of sense: each of the above organization was actually interested
in only one of them.

This year we also want to go "best effort". Fair enough, there are also
places where people actually look for the best data format for a given
software (meaning the software is chosen before the database management
solution) or are just simply looking for the best possible outcome with
any prerequisite of sorts.
These places are not scared about having to deal with new software,
and thus fragmenting the set of knowledge needed to manage IT.
In ten years of work I encountered only two of them, and they
were both research institutes. Maybe it was just bad luck ;-)

As to "why shapefiles" for the baseline format, or "why geotiff",
it's because they are the most commonplace formats, and also because
they are both file based. If you follow the reasoning of the
kind of customer I've been dealing with, file storage does not require
learning new abilities, as you have to deal with files anyways,
that is something you cannot escape in any IT department.
But if you go to a Oracle stronghold telling them to use PostGIS because
it works better (or is the only solution) with your software, well,
unless they are in distress for the licensing costs,
they will laugh at you and shut the door.

Another good reason to use shapefiles and geotiffs is that basically any 
product can do them, and it's likely the first thing a generalist user
will try against a generalist server (yes, this cuts out domain specific
server, e.g., if I were to look at metereology specific servers the 
formats would probably be NetCDF and HDF).

In conclusion I think doing baseline + best effort would satisfy (to
a point) both audiences, whilst last year approach was very satisfactory
to the audience that would impose you a data format, and not at all so
to the one that feels free to choose whatever is best.

It also serves to keep the comparison contents varied, which
does not hurt (otherwise the presentation would get boring in the long
run).

> Rather than work by constraining how a server acts, other than that it
> follow essentially its default behaviour (i.e. "Don't game the test"), I
> was expecting to start with a discussion of what the server would be
> expected to do, i.e. by discussing the testing regime. Then we could
> work backwards to figure out what kind of data would be necessary to
> expose the strengths and weaknesses of different approaches.

I fully agree testing regime must be discussed better.

> 
> There are many questions related to establishing what will actually be
> tested by the benchmarks.
> 
>       * How will correctness be handled?
>                 If a server returns bad images, we simply drop it for
>                 that test?

I would say yes. However, we have to define what bad means.

>       * Will benchmarks be run against WMS 1.0 or 1.3?

I believe we're assuming WMS 1.1

>       * 
>       * To what extent will the benchmark test CRS's?
>                 Since this is potentially one of the more costly
>                 operations, to what extent would this be test.

I also agree some reprojection tests should be included

>       * Will the benchmarks test SLD support?
>                 This is another potentially costly operation.

I don't agree SLD should be made a requirement for the tests (and
I'm speaking against my own interest here, as GeoServer uses SLD
as its main styling language).
Not even INSPIRE mandates the usage of SLD (just suggests it), the
standard is complex and has not encountered the general favor of
implementors, it is usually something added later and providing
limited functionality compared to the native styling abilities (this
is true for GeoServer as well, where the native language is actually
a sort of SLD++).

The grassroot movement towards CSS map styling is quite telling about
this general distress imho.

> Then there are also many questions related to the test metrics. As best
> as I can make out from the results published from last year's test
>         http://www.slideshare.net/gatewaygeomatics.com/...
>         ...wms-performance-shootout
> the principal metric calculated the average response time calculated
> over a series of requests. As I understand it, this is due to the use of
> JMeter as a testing system. Unfortunately, as all introductory courses
> in statistics spend time exploring, the mean is a particularly poor
> measure of central tendency for certain distributions, of which the
> Erlang is a textbook example. The lack of any measure of variance
> further reduces the conclusions that can be drawn from the published
> tests. I would presume the benchmarking effort would want to produce
> usable results based on robust statistics and that there therefore ought
> to be some discussion of how this could be achieved. When I asked Frank
> if every team would have time to test the other servers, I had in mind
> generating a set of metrics in which I would have confidence, even if
> such metrics do not interest anyone else.

I would certainly be interested in evaluating other tools to drive
the run and collect statistics, the solution we adopted last year
worked by was not easy to use. Are your tools available to the
general public, are they something we can use in a multi-platform
setup?

I have reservation about the idea of showing charts with both
average and variance in the presentation.
The visual density in the chart would increase and would make for
a harder to read set of charts.
Generally speaking, the presentation is going to be a plenary one,
not an academic trac one, so I would prefer to avoid excessive
statistical details _in the presentation slides_.
However, there are certainly benefits in gathering better statistics:
- we can make up wiki pages with the extra details (just like we'll
   add details about how the "best effort" is setup)
- we can evaluate those extra information and use it to provide
   interesting highlights for the presentation (as opposed to give
   full detail flat out through the presentation).

Cheers
Andrea


-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.


More information about the Benchmarking mailing list