[postgis-users] efficiently testing for points near polygons

Thu Feb 10 07:20:22 PST 2011

Hi,

Thanks for this response.  Regarding the indices, I was mainly asking if
an index on the point column has any value as the bounding box of a
point is trivial.  

Based on the suggestions you mentioned, I decided to dump my
multipolygons to polygons and splice up my continent polygons to make
the querying faster.  This dramatically speeds things up (a factor of
100).  

Along the way, I made some fairly generic functions for generating grids
and splicing up polygons along a linestring grid.  I imagine that
someone has already created these, but as far as I know they aren't in
the stable version of postgis, so I am attaching them for those
interested.  If they prove of value, perhaps they would be worth adding
to postgis...?

The main functions are ST_MakeLinestringGrid and ST_CutPolyByLinestring.
As an example of their use:

SELECT ogc_fid, ST_CutPolyByLinestring(wkb_geometry,
	ST_SetSRID(ST_MakeLinestringGrid(5),ST_SRID(wkb_geometry))) 
       FROM gpoly;

This will cut all polygons in wkb_geometry along a 5 deg. by 5 deg. grid
and return the individual polygons.  wkb_geometry must be POLYGON and
not MULTIPOLYGON.

ST_CutPolyByLinestring works perfectly for simple cases, but for very
irregular polygons, sometimes things don't get cut as they are supposed
to.  This is because when the grid is intersected with the polygon,
there appears to be enough rounding error that ST_Polygonize sometimes
doesn't think the remaining pieces of the grid always touches the edges
of the original polygons.  Perhaps ST_SnapToGrid will help?  If anyone
has ideas to fix this problem, please let me know.  For my uses, this
doesn't have much impact as enough cutting goes on to improve bounding
boxes on most polygons.

Cheers,
David

On Tue, 2011-02-08 at 15:42 +0100, Nicklas Avén wrote:
> Hallo
> 
> The overall performance will be better if you use ST_Dwithin instead of
> buffer and intersects. But if the buffer is done once and will never
> happen again maybe there is not much gain from that.
> 
> The bounding boxes is very important here. The index works by first
> finding overlapping bounding boxes and then from that do a recheck and
> calculate if the real geometries actually is intersecting (or within the
> given distance)
> 
> Because of that it is better the smaller your geometries are which means
> you should absolutely not union them together. If you have whole
> continents in one polygon the index will be totally worthless because
> most points will fall inside the bounding box anyway. So it might be
> difficult to get good performance out of a dataset with very big
> geoemtries, but there are techniques to slice them up and index the
> smaller parts instead. To get a good description about that "PostGIS in
> Action" is a good source of knowledge (http://manning.com/obe)
> 
> 
> /Nicklas
> 
> 
> On Tue, 2011-02-08 at 14:47 +0100, David Kaplan wrote:
> > Hi,
> > 
> > I have a large set of oceanic point data that I need to test to identify
> > all points that are within a certain distance of land.  I initially
> > tried doing the most obvious thing I could think of - buffer land
> > polygons and test points for intersection, but this is taking too long
> > for the full dataset and is giving me strange "terminating connection
> > due to administrator command" errors.  I imagine that there is something
> > in my initial strategy that doesn't scale well, but it is hard for me to
> > identify what is the best strategy.  Hopefully someone already knows the
> > answer and can give me a hand.
> > 
> > Here are some concrete questions that I hope someone can help me with:
> > 
> > 1) How exactly are indexes used for point data?  AS BBox doesn't make
> > much sense (at least to me), I imagine that this somehow groups
> > identical points so that operations only have to be done once for each
> > distinct point.  Is this the case?
> > 
> > 2) I initially made a function to test for proximity to land:
> > 
> > CREATE OR REPLACE FUNCTION near_land(geo geometry)
> > RETURNS boolean AS
> > $BODY$
> > SELECT bool_or( ST_Intersects($1,wkb_geometry_5km_buf) ) 
> > FROM gshhs_h_l1;
> > $BODY$ 
> > LANGUAGE 'sql' STABLE;
> > 
> > This works fast for small amounts of data, but I have a feeling that
> > this prohibits using indexes for the full dataset and therefore could be
> > causing much of my slowness.  Is this so?
> > 
> > 3) Would it be best to ST_Union all my polygons before making the
> > comparison, knowing that the polygons don't overlap?  Or is this no
> > different than just doing the intersect and then using bool_or?
> > 
> > 4) Would using long transaction support help me avoid the "administrator
> > command" errors?  I don't really know what long transaction support
> > does, but my transaction certainly is long...
> > 
> > Thanks for the help.
> > 
> > Cheers,
> > David
> > 
> > 
> > 
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

-- 
**********************************
David M. Kaplan
Charge de Recherche 1

Institut de Recherche pour le Developpement
Centre de Recherche Halieutique Mediterraneenne et Tropicale
av. Jean Monnet
B.P. 171
34203 Sete cedex
France

Phone: +33 (0)4 99 57 32 27
Fax: +33 (0)4 99 57 32 95

http://www.ur097.ird.fr/team/dkaplan/index.html
http://www.amped.ird.fr/
**********************************

-------------- next part --------------
A non-text attachment was scrubbed...
Name: polygon_clipping_functions.sql
Type: text/x-sql
Size: 2632 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110210/314d3ee3/attachment.bin>