[postgis-users] Deleting many points falling outside a large polygon (but within the bbox)

Thu Jul 22 13:17:02 PDT 2010

Hi Tim,

A point-in-polygon test approach would seem to make sense. I did a quick 
Google search and ran into posting on Paul Ramsey's Clever Elephant blog 
that seems to describe exactly your situation: 
http://blog.cleverelephant.ca/2008/09/point-in-polygon-shortcuts.html

A few more comments on this from Paul about implementation would likely 
be helpful, at least for me.

Dan

On 07/22/2010 12:44 PM, Tim Keitt wrote:
> I have a very large table of points; basically I pushed a raster
> extending over much of the western hemisphere at 1km resolution into
> the db as xyz (actually 2d points + z in another column). Not a crazy
> as it sounds as you can do a lot of interesting things intersecting
> these grid points with other geometries, and I have a lot of RAM and
> disk space available. The region of interest however is much smaller
> than the original raster and is defined by a large polygon (certain
> continent margins) composed of many vertices. Deleting the points
> outside the bounding box of the roi is reasonably quick, however there
> are many points that remain within the bbox, but outside the polygon.
> These take forever to cull as it appears the entire polygon has to be
> searched for each point. Its looking like this will at least take
> days, perhaps much more on a fairly fast machine.
>
> I'm curious if anyone has a reasonable solution. I was thinking of
> dumping the roi polygon as points and then recursively subdividing the
> bounding box, building quad-polygons on the way down. Those quads that
> contain roi points are split into 4 while those that contain no points
> remain. After a few recursion levels, you figure out which
> quad-polygons are disjoint from the roi polygon and delete any
> enclosed grid points. Points intersecting quads that are within the
> roi polygon are not touched. Grid points within the remaining quads
> would have to be searched one-by-one, but that should be a small
> fraction of the total. Basically the idea is to emulate a kind of
> quad-tree index in pure sql. Or alternatively I'll just come back in a
> few weeks and see if the brute force query is done...
>
> THK
>
>