[postgis-users] Speeding up point-in-polygon search using ST_SimplifyPreserveTopology?

Evan Martin postgresql at realityexists.net
Tue May 29 01:47:07 PDT 2012


Thanks, Stephen, but I haven't been able to figure out how to get this 
to work on geography. ST_Intersection on geography doesn't work properly 
across the dateline. For example, I tried manually tiling a simple 
polygon that goes across the dateline, like this:

WITH data AS
(
     SELECT ST_GeomFromEWKT('SRID=4326;POLYGON((163 50,-176 55,-176 
60,163 60,163 50))')::geography AS simple
),
tiles AS
(
     SELECT ST_SetSRID('BOX(145 50,150 60)'::box2d, 4326) AS tile
     UNION ALL SELECT ST_SetSRID('BOX(150 50,155 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(155 50,160 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(160 50,165 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(165 50,170 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(170 50,175 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(175 50,180 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(-180 50,-175 60)'::box2d, 4326)
     UNION ALL SELECT ST_SetSRID('BOX(-175 50,-170 60)'::box2d, 4326)
)
SELECT ST_AsText(ST_Intersection(simple, tile::geography)) AS tiled
FROM data
JOIN tiles ON ST_Intersects(simple, tile::geography)

This returns:

POLYGON((163 49.9999999999887,160 50.0467770263591,160 
59.9999999999987,163 59.9999999999987,163 49.9999999999887))
GEOMETRYCOLLECTION EMPTY
GEOMETRYCOLLECTION EMPTY
GEOMETRYCOLLECTION EMPTY
POLYGON((-175 54.9860851802267,-176 54.9999999999957,-176 
59.9999999999987,-175 59.9999999999987,-175 54.9860851802267))

I think ST_Intersects works OK on geography, but not ST_Intersection.

Regards,

Evan


On 29/05/2012 2:21 AM, Stephen Woodbridge wrote:
> On 5/28/2012 10:08 AM, Evan Martin wrote:
>> Hi all,
>>
>> I have a table of ~350 multi-polygons and ~185,000 points and I need to
>> find which points are inside which multi-polygons. Some polygons are
>> quite large and span the dateline, so I'm using geography ST_DWithin for
>> this (with a tolerance of 100m). My initial query looks like this
>> (simplified):
>>
>> SELECT ...
>> FROM points, polygons
>> WHERE ST_DWithin(point, real_area, 100)
>>
>> This works, but it takes about 90 minutes. I'm trying to make it faster
>> by using ST_SimplifyPreserveTopology. That worked very nicely for my
>> "adjacent polygons" problem
>> [http://postgis.refractions.net/pipermail/postgis-users/2012-January/031992.html], 
>>
>> because all polygons were modified in the same way, but this is
>> trickier. Since I'm modifying the polygon, but not the point, the
>> results are different. So I thought maybe I could do this in two phases:
>> if the point is well within or well outside the polygon then take the
>> result of the "simplified" check as correct, but if it's close to the
>> border then check it properly, ie.
>>
>> SELECT ...
>> FROM points, polygons
>> WHERE ST_DWithin(point, simple_area, 20000)
>> AND (ST_Distance(point, simple_border) > 20000 OR ST_DWithin(point,
>> real_area, 100))
>>
>> simple_area = ST_SimplifyPreserveTopology(real_area::geometry,
>> 0.01)::geography and simple_border = the exterior ring of simple_area.
>>
>> This takes about 18 minutes (a 5x improvement) and gives very similar
>> results, but not quite the same. It falls down on polygons that have
>> rhumblines along parallels, because they get turned into great circle
>> lines. Eg. the original polyon may have a rhumbline approximated as (24
>> 10,25 10,26 10,27 10), ST_SimplifyPreserveTopology does its job and
>> simplifies it to (24 10,27 10) and then ST_DWithin on geography treats
>> it as a great circle line, giving an incorrect result. I tried inserting
>> extra points to "unsimplify" the rhumblines, but that itself is very
>> slow and also quite a hack, because I can't really be sure which lines
>> were supposed be rhumblines when looking at the simplified polygon. I
>> feel like I'm so close and this is a very silly little problem - but it
>> is a problem.
>>
>> Could anyone suggest a way to work around this? Or a different approach
>> to the whole thing?
>
> I think the alternative approach that has been discussed on the list 
> in the past, and should be in the archives, is to cut the multiple 
> polygons into tiles with all the attributes of the original 
> multipolygon and then to match the points to the tiles..
>
> This works much faster for two basic reasons:
>
> 1. the number of points in the each tile is much less than the 
> original because it only contains a fraction of the originals complex 
> boundary but maintains all the detail.
> 2. since each tile is spatially smaller, you get better (faster) 
> interactions between the tile index and the points.
>
> -Steve
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list