[postgis-users] strange benchmark results
Andreas Brandl
ml at 3.141592654.de
Wed Jan 6 07:05:44 PST 2010
----- "Mark Cave-Ayland" <mark.cave-ayland at siriusit.co.uk> wrote:
> Andreas wrote:
>
> > Hi folks,
> >
> > I'm in the middle of my bachelor thesis in computer science which is
> all about spatial databases. The main part is benchmarking most of the
> spatial functions described by OGC simple features on different
> spatial databases (postgis, ms sql server 2008, ibm db2 gse).
> >
> > The last days I ran the benchmark for PostGIS and had a first look
> on the results. Some test cases show a strange behaviour which I like
> to share here hoping for some feedback.
> >
> > The benchmark consists of 30 queries which are kept simple and
> mainly testing one OGC function at a time. The main aspect is
> measuring the query time via JDBC and via the output of "explain
> analyze". To achieve a better quality the whole benchmark is run
> multiple times ("rounds").
> >
> > The benchmark was run on Windows XP (due to having ms sql server in
> the benchmark), 2 GB RAM, core 2 duo with default postgres
> configuration (I overlooked to tune, but will catch up) and having a
> GIST index on all geometry columns. PostgreSQL 8.4.1 and PostGIS
> 1.4.0.
> >
> > So there is a query like
> >
> > SELECT ST_Union(the_geom) FROM us_county
> >
> > which runs an aggregate union on single polygons of TIGER data.
> >
> > The odd result is, that over time (in the course of 100 rounds) the
> query time gets "shaked up", ranging from 100 seconds in the beginning
> up to 800 seconds in the end. Please see the first plot [1] for
> details. x-coordinate shows round n (1-100), y-coordinate the measured
> query time using EXPLAIN ANALYZE.
> >
> > There are other functions which show a similar behaviour (i.e.
> ST_Within, ST_Touches, ST_Buffer). These results are also shown on
> [1].
> >
> > I appreciate any hint and feedback on these results. My (heavily
> guessed) interpretation of this one is that there may exist kind of
> leak and something like a garbage collection cleanup up in between.
> >
> > I'm quite sure that there is no other influence on the system the
> benchmark ran on.
> >
> > Running the benchmark on a ubuntu 64-bit does not show this
> behaviour (most results do have a small deviation). This leads me to a
> more general question: Is it in general advisable to run spatial
> databases on 64-bit OS?
> >
> > Another general question: Are there more specific tuning advices for
> postgis (regarding config) than those which apply to Postgres in
> general?
> >
> > If I may provide more information, please let me know.
> >
> > Thanks a lot!
> >
> > Best Regards,
> > Andreas
>
Hi,
sorry for the delay - getting those benchmark results took quite some time...
> You should find that the results should be the same across any
> platform;
> however you've missed some very important information from your tests.
>
> The first questions I would like to ask are:
>
> i) Do you re-open a connection for every SELECT command you issue to
> the
> database, or do you use a new connection? If you don't, does it make a
>
> difference if you do?
Yes, in fact, running the same benchmark but re-opening the connection for every SELECT does make a difference. I prepared an comparison chart showing the timeline for either test-modes. [1]
"test1" (on the left) shows the query times working with a single Connection for all tests. The described effect may not be overlooked IMO.
"test2" (on the right) shows same tests with the mode suggested by Mark (re-opening Connection). Here the effect does not show up.
Both test1, test2 were run on the same machine (2 GB RAM, core 2 duo) and same configuration, which may be found here [2]. Config is a tuned default config (changed shared_buffers=128MB, work_mem=10MB, effective_cache_size=1500MB).
BTW: Tuning the postgresql.conf resulted in overall better results (as expected) but had no impact on this effect.
I would really appreciate any insight on this behaviour. Next I'm going to run the benchmark on a linux OS.
Regards,
Andreas
[1] http://students.fim.uni-passau.de/~brandlan/ba/benchmark/postgis-02/timeline_postgres_index.html
[2] http://pastebin.com/m1f10fb88
More information about the postgis-users
mailing list