[postgis-users] Documenting performance tip

strk at refractions.net strk at refractions.net
Fri Feb 4 01:08:37 PST 2005


Forgot to mention: I suggest having a new chapter:
6. Performace Tips
I belive we will have more to say about it now or in the future.
Comments?
--strk;

On Fri, Feb 04, 2005 at 10:02:57AM +0100, strk at refractions.net wrote:
> Markus, thanks for the insight. 
> A side-effect of your suggestion is that users will end up
> actually using box2d type in their tables.
> I think they should be warned that those column won't retain
> their original precision between dump/reloads.
> More so now that box2d canonical ascii output has been changed
> to use default precision (%g).
> What do you think ?
> 
> --strk;
> 
> On Thu, Feb 03, 2005 at 02:11:24PM +0100, Markus Schaber wrote:
> > Hi, Strk,
> > 
> > strk at refractions.net schrieb:
> > 
> > > Beside writers skills this needs full understanding of the problem.
> > > If you can document the problem and the tip it would be helpful.
> > 
> > Okay, I'll give it a try:
> > 
> > - snip -
> > 
> > *** Performance tip for small tables of large geometries ***
> > 
> > * Problem Description *
> > 
> > Current PostgreSQL versions (including 8.0) suffer from a query
> > optimizer weakness regarding TOAST tables. TOAST tables are a kind of
> > "extension room" used to store large (in the sense of data size) values
> > that do not fit into normal data pages (like long texts, images or
> > complex geometries with lots of vertices), see
> > http://www.postgresql.org/docs/8.0/static/storage-toast.html for more
> > information).
> > 
> > The problem appears if you happen to have a table with rather large
> > geometries, but not too much rows of them (like a table containing the
> > boundaries of all european countries in high resolution). Then the table
> > itsself is small, but it uses lots of TOAST space. In our example case,
> > the table itsself had about 80 rows and used only 3 data pages, but the
> > TOAST table used 8225 pages.
> > 
> > Now issue a query where you use the geometry operator && to search for a
> > bounding box that matches only very few of those rows. Now the query
> > optimizer sees that the table has only 3 pages and 80 rows. He estimates
> > that a sequential scan on such a small table is much faster than using
> > an index. And so he decides to ignore the GIST index. Usually, this
> > estimation is correct. But in our case, the && operator has to fetch
> > every geometry from disk to compare the bounding boxes, thus reading all
> > TOAST pages, too.
> > 
> > To see whether your suffer from this bug, use the "EXPLAIN ANALYZE"
> > postgresql command. For more information and the technical details, you
> > can read the thread on the postgres performance mailing list:
> > http://archives.postgresql.org/pgsql-performance/2005-02/msg00030.php
> > 
> > * Workarounds *
> > 
> > The PostgreSQL people are trying to solve this issue by making the query
> > estimation TOAST-aware. For now, here are two workarounds:
> > 
> > The first workaround is to force the query planner to use the index.
> > Send "SET enable_seqscan TO off;" to the server before issuing the
> > query. This basically forces the query planner to avoid sequential scans
> > whenever possible. So it uses the GIST index as usual. But this flag has
> > to be set on every connection, and it causes the query planner to make
> > misestimations in other cases, so you should "SET enable_seqscan TO on;"
> > after the query.
> > 
> > The second workaround is to make the sequential scan as fast as the
> > query planner thinks. This can be achieved by creating an additional
> > column that "caches" the bbox, and matching against this. In our
> > example, the commands are like:
> > 
> > SELECT addGeometryColumn('mydb','mytable','bbox','4326','GEOMETRY','2');
> > UPDATE mytable set bbox = setsrid(box3d(geom_column)::geometry, 4326);
> > 
> > Now change your query to use the && operator against bbox instead of
> > geom_column, like:
> > 
> > SELECT geom_column FROM mytable WHERE bbox && setsrid('BOX3D(0 0,1
> > 1)'::box3d,4326);
> > 
> > Of yourse, if you change or add rows to mytable, you have to keep the
> > bbox "in sync". The most transparent way to do this would be triggers,
> > but you also can modify your application to keep the bbox column current
> > or run the UPDATE query above after every modification.
> > 
> > - snap -
> > 
> > The question is where to place it. I think adding it to paragraph 4.5.2
> > in the doc would bloat it too much, so introducing a 4.5.3 might be the
> > best.
> > 
> > Markus
> > --
> > markus schaber | dipl. informatiker
> > logi-track ag | rennweg 14-16 | ch 8001 zürich
> > phone +41-43-888 62 52 | fax +41-43-888 62 53
> > mailto:schabios at logi-track.com | www.logi-track.com
> 
> 
> 
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at postgis.refractions.net
> > http://postgis.refractions.net/mailman/listinfo/postgis-users



More information about the postgis-users mailing list