[postgis-users] Spatial join optimisation

Bill Binko bill at binko.net
Thu May 12 22:45:27 PDT 2005


On Fri, 13 May 2005, Rhys Ickeringill wrote:

> Thanks Paul, Bill.
> 
> HUGE speed improvement. According to the postgres ANALYSE command your
> suggestions cut the estimated query cost by a factor of 1000 (!). My query
> hasn't finished yet - judging by how long it took to compute 100 records, my
> estimate is one hour for the whole lot.

I suppose I should have asked how many shapes you were working with... 
perhaps seconds was optimistic.  However, an hour still seems a long time.

I noticed below that you said that only one query was being used, even in 
the final solution.  Could you post the query plan (explain select....) on 
that?  

If your "Points" sometimes lie outside ALL of the polygons in your Polygon 
table, you can get another benefit by first calculating the extent() of 
your polygons and adding something like this to your query

the_point_geom && setSRID('POLYGON((224941.15625 1192115.125,224941.15625 
1396262.875,311462.8125 1396262.875,311462.8125 1192115.125,224941.15625 
1192115.125))'::geometry, 26759)

Of course, that's my extent, not yours.  That will leverage both of the 
indexes you have.

to get the extent, you can just do something like

select asText(extent(the_polygon_geom)) from Polygons;

I'm considering creating an Extents table that contains the extents of my 
very static layers.  That way I could just join them in.

> 
> Bill, the difference between your "quick and dirty" approach and your final
> approach is a factor of two or so - purely due to there being no geometry
> index on the points, hence the sequential scan in the join goes to the
> [larger] points table (and the polygon geometry index is used). If I create
> an index on the points table...
> 
> CREATE INDEX GeoIdx ON Points USING GIST( SetSRID( MakePoint(
> Points.longitude, Points.latitude ), 32767 ) )
> 
> this index gets used and your "quick and dirty" approach runs perhaps 10%
> slower than your final (and Paul's) approach - I guess this is the overhead
> of making points on the fly.

It's also using Postgresql's functional indexes, which probably adds some 
overhead.  You're beter off adding a geometry column, and deleting the 
lat/long columns: you can always extract them from the shape if you need 
them.

> Is it worth (for the benefit of the list) beginning a discussion on what
> "geometry operators" are best used with each "geometry processing function",
> with an eye to including a "tip" associated with each geometry processing
> function's entry in the PostGIS manual? (I'd be happy to do the
> documentation changes)

I think there is definitely a need for a "Tuning Guide".  The 
documentation mentions the fact that you need to use the operators for 
Postgresql to consider using indexes, but there are many other questions 
that could be explored such as:

1) When to extract non-spatial attributes and join on those instead
2) How to use Postgresql's partial indexes (ones with a WHERE clause)
3) How to use indexes with spatial and non-spatial attributes (is it 
possible?)
4) What are the relative costs (in terms of order of complexity) of 
the various operations

I know much of this would be trial-and-error, but there needs to be a 
place for general guidelines.

BTW, Rhys, you also should consider modifying the performance parameters 
in postgresql.conf as some of them make a huge difference 
(random_page_cost = 2 really helped me!).

Bill



More information about the postgis-users mailing list