[postgis-users] efficiently testing for points near polygons

Tue Feb 8 06:42:02 PST 2011

Hallo

The overall performance will be better if you use ST_Dwithin instead of
buffer and intersects. But if the buffer is done once and will never
happen again maybe there is not much gain from that.

The bounding boxes is very important here. The index works by first
finding overlapping bounding boxes and then from that do a recheck and
calculate if the real geometries actually is intersecting (or within the
given distance)

Because of that it is better the smaller your geometries are which means
you should absolutely not union them together. If you have whole
continents in one polygon the index will be totally worthless because
most points will fall inside the bounding box anyway. So it might be
difficult to get good performance out of a dataset with very big
geoemtries, but there are techniques to slice them up and index the
smaller parts instead. To get a good description about that "PostGIS in
Action" is a good source of knowledge (http://manning.com/obe)

/Nicklas

On Tue, 2011-02-08 at 14:47 +0100, David Kaplan wrote:
> Hi,
> 
> I have a large set of oceanic point data that I need to test to identify
> all points that are within a certain distance of land.  I initially
> tried doing the most obvious thing I could think of - buffer land
> polygons and test points for intersection, but this is taking too long
> for the full dataset and is giving me strange "terminating connection
> due to administrator command" errors.  I imagine that there is something
> in my initial strategy that doesn't scale well, but it is hard for me to
> identify what is the best strategy.  Hopefully someone already knows the
> answer and can give me a hand.
> 
> Here are some concrete questions that I hope someone can help me with:
> 
> 1) How exactly are indexes used for point data?  AS BBox doesn't make
> much sense (at least to me), I imagine that this somehow groups
> identical points so that operations only have to be done once for each
> distinct point.  Is this the case?
> 
> 2) I initially made a function to test for proximity to land:
> 
> CREATE OR REPLACE FUNCTION near_land(geo geometry)
> RETURNS boolean AS
> $BODY$
> SELECT bool_or( ST_Intersects($1,wkb_geometry_5km_buf) ) 
> FROM gshhs_h_l1;
> $BODY$ 
> LANGUAGE 'sql' STABLE;
> 
> This works fast for small amounts of data, but I have a feeling that
> this prohibits using indexes for the full dataset and therefore could be
> causing much of my slowness.  Is this so?
> 
> 3) Would it be best to ST_Union all my polygons before making the
> comparison, knowing that the polygons don't overlap?  Or is this no
> different than just doing the intersect and then using bool_or?
> 
> 4) Would using long transaction support help me avoid the "administrator
> command" errors?  I don't really know what long transaction support
> does, but my transaction certainly is long...
> 
> Thanks for the help.
> 
> Cheers,
> David
> 
> 
>