[postgis-users] Tricks to find polygon/line intersection faster

Evan Martin postgresql at realityexists.net
Mon Feb 10 12:33:00 PST 2014


I've discovered a slight problem with the handy "tiled intersection" 
trick suggested earlier: some of my lines run exactly along a meridian 
or along a parallel and so do the tiles, so those intersections get 
counted twice! For example, LINESTRING(-18 14.5,-18 15.5) results in the 
following intersections with a particular tiled polygon

LINESTRING(-17.9999148001148 14.7502863979935,-17.9998863986648 
15.0000002498516)
LINESTRING(-18.0000851998852 14.7502863979933,-18.0001136013352 
15.0000002498516)

LINESTRING(-18.0001136013352 15.0000002498516,-18.0000852013123 
15.2502965758817)
LINESTRING(-17.9998863986648 15.0000002498516,-17.9999147986877 
15.250296575882)

Could someone suggest the best (fastest while still accurate) way to 
filter out such duplicates? As you can see, they're not exactly the 
same, so ST_Equals() returns false on them.

Evan

On 08/07/2013 17:36, Evan Martin wrote:
> Thanks, Steve, that seems to do the trick. Of course the results 
> change a bit, so it's a trade-off of accuracy vs. speed. I presume the 
> change is because I do the tiling on the plane - ST_Intersection(geom, 
> geom). When I tried doing tiling on geography the results changed much 
> more (compared to no tiling). Would be interesting to understand why 
> that is. Am I doing something wrong? I create a grid of 1x1 degree 
> polygons and then do this:
>
>     SELECT poly_id, ST_Intersection(poly_border::geometry, 
> tile)::geography AS poly_tile
>     FROM my_polygon p
>     JOIN world_tile_1 t ON ST_Intersects(p.border::geometry, t.tile)
>
> The intersection with lines is then done on geography, as before. I 
> only do this for polygons that don't span the dateline (which is 99% 
> of them, luckily).
>
> Evan
>
> On 06.07.2013 21:19, Stephen Woodbridge wrote:
>> The standard way of dealing this this is to chop you really large 
>> polygons into tiles. Or if the multipolygons can be split into 
>> multiple individual polygons you might get better performance.
>>
>> google: postgis tiling large polygons
>>
>> if you need the distance that the line intersects the multiple tiles 
>> or multiple split multipolygons you will need to sum() and group on 
>> the original id of the split object.
>>
>> -Steve
>>
>> On 7/6/2013 1:10 PM, Evan Martin wrote:
>>> It's not really "many large things vs many large things". Most lines 
>>> are
>>> < 100 km long (but there are some over 1000 km). Here's a percentile
>>> chart: https://imageshack.us/a/img16/940/w5s.png
>>>
>>> Most of the polygons are also quite small and simple, but there are a
>>> few really large complex ones. From my testing it looks like a few of
>>> the "worst" polygons (multi-polygons, actually) take all the time, so
>>> that 25,000 count was a bit misleading. 96% of them have < 100 points,
>>> but the worst one has > 23,000. I couldn't get the area, because
>>> ST_Area(geog) is returning some ridiculously high numbers, but it would
>>> be millions of sq km.
>>>
>>> On 06.07.2013 5:48, Paul Ramsey wrote:
>>>> Without seeing your data it's quite hard to say. Many large things vs
>>>> many large things yields a problem where indexes and so on don't have
>>>> a lot of leverage on the problem.
>>>>
>>>> On Tue, Jul 2, 2013 at 6:39 AM, Evan Martin
>>>> <postgresql at realityexists.net> wrote:
>>>>> Hi,
>>>>>
>>>>> I have tables of ~25,000 polygons and ~80,000 lines and I want to
>>>>> find which
>>>>> lines intersect which polygons using PostGIS 2.1. Both are
>>>>> geographies and
>>>>> can span the dateline. Doing this the simple way using
>>>>> ST_Intersects(geog,
>>>>> geog) takes about 3 hours on my machine and I'd to see if there's a
>>>>> way to
>>>>> speed this up.
>>>>>
>>>>> I already have indexes on the geography columns and one of them is 
>>>>> being
>>>>> used (the one on the lines). Each line only has 2 points, but the
>>>>> polygons
>>>>> have anywhere from 4 to 20,000 points and some of them are very
>>>>> large. It
>>>>> would be OK to miss some of the smaller intersections (ie. where 
>>>>> the two
>>>>> only just barely intersect), but I wouldn't want the query to return
>>>>> false
>>>>> positives. In fact, ideally, I'd like to find only the lines that
>>>>> "substantially" intersect a polygon, eg. at least x km or x% of the
>>>>> line is
>>>>> in the polygon, but finding any intersections at all would be a 
>>>>> start.
>>>>>
>>>>> One trick I tried is ST_SimplifyPreserveTopology. I used that to 
>>>>> create
>>>>> simplified version of the polygons (at least those that don't span 
>>>>> the
>>>>> dateline) and check those first, then if they intersect then check
>>>>> the real
>>>>> polygons. This seems to work, but the performance gains are marginal
>>>>> compared to the simple approach.
>>>>>
>>>>> Is there another trick I can use to do this faster? I know
>>>>> ST_Intersects()
>>>>> internally calls ST_Distance(), which calculates the distance to a
>>>>> fraction
>>>>> of a metre. I don't need that kind of precision, so surely there's 
>>>>> some
>>>>> "shorcut" to be found?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Evan



More information about the postgis-users mailing list