[postgis-users] question on gist performance

Stefan Zweig stefanzweig1881 at web.de
Fri Sep 7 01:06:38 PDT 2007


hi mark,

thanks for your quick reply. i have tried your advice and rebuilded all my gist indexes on the table (via drop index, create index) and run a vacuum full afterwards. but that did not change the fact, that the query on the_geom is much slower than the same one on the_geom_1.

after that my colleague has tried another thing out:

"I have a small table (about 200 entries) containing large
(multipolygon) geometries (as discussed in http://postgis.refractions.net/docs/ch05.html),
stored in the_geom. Another column the_geom_envelope holds its
envelope geometries. Each column is indexed using GIST.

I experienced that doing a simple bounding box query results in
different times, dependent on the queried column:

  '...WHERE the_geom && expr;' takes about 250ms,
  '...WHERE the_geom_envelope && expr;' takes about 20ms.

ANALYZE says that GIST is used. As far as I know, the GIST makes use
only of the bounding boxes of Geometry objects, which are equal for
each object. How does it come that the query time using the_geom is more than
*ten times slower* than the query time using the_geom_envelope?"

which to me basically seems to be the same problem.

any ideas?

thanks in advance, stefan


> -----Ursprüngliche Nachricht-----
> Von: PostGIS Users Discussion <postgis-users at postgis.refractions.net>
> Gesendet: 06.09.07 09:40:18
> An: PostGIS Users Discussion <postgis-users at postgis.refractions.net>
> Betreff: Re: [postgis-users] question on gist performance


> 
> On Wed, 2007-09-05 at 18:12 +0200, Stefan Zweig wrote:
> > hi list,
> > 
> > i have noticed a strange thing while selecting geometries from a table and using the && operator (gist):
> > 
> > i have a table which holds all countrys of the world (somewhat more than 200) an their boundaries in column the_geom
> > the boundaries have a quite good resolution, so the total amount of points is huge.
> > for better performance (if low resolution is needed) i created a second column 
> > 
> > the_geom_1 as:
> > UPDATE table SET the_geom_1=simplify(the_geom,0.01)
> > 
> > which results in a less total amount of points compared to the original data.
> > 
> > i have set up a gist index on both columns
> > 
> > and now the strange thing:
> > SELECT name FROM _g2965 WHERE the_geom && SetSrid('BOX(13.760675 51.171104,15.231802 51.83737)'::box2d,4326);
> > takes 297ms
> > 
> > SELECT name FROM _g2965 WHERE the_geom_1 && SetSrid('BOX(13.760675 51.171104,15.231802 51.83737)'::box2d,4326);
> > takes 15ms
> > 
> > actually i would have thought that the query using the gist index should take the same time on both columns, because gist index strategy is comparing bounding boxes of the geometries with the bounding box of my query, isn't it?
> > can anybody give me a hint why there is such a massive time difference when selecting from different columns?
> > 
> > 
> > thanks in advance, stefan
> 
> 
> Hi Stefan,
> 
> I suspect that this because of your UPDATE statement. For each row that
> is touched during an UPDATE, a new entry is created in the GiST index -
> for various transactional reasons, the old version cannot be deleted.
> 
> For example: imagine I have your table with containing 200 countries of
> the world. When first created, your heap/indices look like this:
> 
> table: 200 rows
> the_geom_gist: 200 rows
> 
> Now you do your UPDATE table SET the_geom_1=simplify(the_geom,0.01):
> this creates new versions of each row in both the heap and the index.
> 
> table: 400 rows (200 live)
> the_geom_gist: 400 rows (200 live)
> 
> If you now build your second index:
> 
> table: 400 rows (200 live)
> the_geom_gist: 400 rows (200 live)
> the_geom_1_gist: 200 rows
> 
> The second index is smaller because the table only contained 200 live
> rows, and so these were the only ones used to build the index. Now I see
> from your schema that you've done this another two times which gives
> this:
> 
> table: 800 rows (200 live)
> the_geom_gist: 800 rows (200 live)
> the_geom_1_gist: 600 rows (200 live)
> the_geom_2_gist: 400 rows (200 live)
> the_geom_3_gist: 200 rows
> 
> So as you can see the original index is 4 times the size of the original
> which is why it takes longer to traverse. The quick solution would be to
> rebuild all of the GiST indices on your table (if these tables are
> fairly static you may wish to do a VACUUM FULL too). However, I would
> probably create one table with just a primary key and geometry column
> for each different resolution, and then perform a join onto your
> original table based upon the primary key, dropping the geometry columns
> from your original table.
> 
> 
> HTH,
> 
> Mark.
> 
> -- 
> ILande - Open Source Consultancy
> http://www.ilande.co.uk
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
> 




_______________________________________________________________________
Jetzt neu! Schützen Sie Ihren PC mit McAfee und WEB.DE. 3 Monate
kostenlos testen. http://www.pc-sicherheit.web.de/startseite/?mc=022220




More information about the postgis-users mailing list