[Benchmarking] Test matrix (aka putting tests on a diet)

Sun Sep 27 10:40:30 EDT 2009

Hi,
yestdays with Jeff we were talking about the size of the images that
the tests will use to load the servers, whether it's better to
hit the servers with common tile size requests (256x256) or common
screen size request (anything between 640x480 to 1024x768, which also
includes the common 768x768 base metatile request that tile caches
love to issue).

The two are quite different setups, lead to very different results
in terms of requests per second, interact a whole lot differently
with labeling and so on. Difficult to say which is the right approach,
it may be interesting to have both.

Which made me think it's time to summarize a little what tests we
do want to make and what we can actually present, time is short
so we should concentrate on having results for what will fit
the presentation.

The servers we're considering are GeoServer, MapServer, ArcGis,
each in two versions (stable and current development).

Each round of test shows how the throughput (or response time)
varies with 1, 10, 20 and 40 concurrent clients, the resulting
presentation would be something similar to the last year one I guess:
http://conference.osgeo.org/index.php/foss4g/2008/paper/view/256/191

In particular, look at test #6, the WFS one, where we have examples
of graphs that do compare 4 different configurations (MS vs GS,
GML2 vs GML3, combined). If we keep two versions per server our
typical graph will have 6 bars, taking the data set, the backend
and the type of load as a given, and

So, vector wise, what do we have?
Data sets: lines, points and polygons (a round with the three layers
            in one request was suggested too)
Backends: shapefile, postgis, SDE. Direct Oracle was suggested
           a lot as well lately (I know there is interest from me, Jeff,
           and Michael so that we can compare it to PG and to SDE over
           Oracle).
Type of requests: small tiles and screen sized ones

This results in 3 * 4 * 2 different combinations: 24 different
comparisons just for the rasters. Way too many, the presentation
at FOSS4G 2008 contained 12 charts total span over 6 different types
of tests.

And we still have to add the raster tests, where we have the ECW,
the mosaic, and there were talks about letting each server choose
its optimal configuration, which will make for 3 more charts at the
very least.

Clearly we need to choose what we want to show in the vector land.
I don't think there is a pressing need to compare every possible
configuration, in the end that will make people quite confused.
Here is a proposal of what combinations we could present.

Despite me liking the idea, I guess we should abandon
the idea of having separate loads by request size, and have a
uniform randomized load set spanning from small images to bigger
one, between 256x256 and 1024x768, uniformly distributed if
possible (the current wms_requests.py privileges bigger images,
if my memory serves me right, using a logaritmic distribution
instead).
This will shave the number of charts by 2 to start with.

Polygon set: with its minimal styling (plain uniform fill)
it makes for an interesting backend
comparison case, as the rendering part wont' have to do
anything fancy. So I suggest to run this one against all backends,
shapefile, postgis, oracle, SDE
-> 4 charts

Point set: an exercise in retrieving and blitting raster images,
won't require to move many data, there is filtering, but the 
classification actually covers most of the point types so not many
features will be actually shaved off the display, so data loading
wise it will just be important to have a spatial index.
Shall we compare only shapefiles?
-> 1 chart

Line set: with the tiger data 2008 set this will be both an
exercise in rendering with order constraints and in data filtering
as we'll be matching roughly 50% of the lines contained in the file.
I would suggest shapefile, postgis, SDE
-> 3 charts

This would leave 3-4 charts to the raster part, which will still
be quite a bit under-represented over the total (next year maybe
we do two shootouts, the vector and the raster one, ok? ;-) )

The above proposal is still quite arbitrary, please provide
feedback so that we'll soon have a set of official tests cases
to run.

Cheers
Andrea

* percentage of rendered edges:
select count(*) from edges_merge where mtfcc in ('S1740', 'S1400', 
'S1200', 'S1100') --> 3469093
select count(*) from edges_merge --> 5846060

-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.