[postgis-users] speeding up simple pt-in-poly lookups

Mr. Puneet Kishor punk.kish at gmail.com
Thu Dec 22 21:04:36 PST 2011


On Dec 22, 2011, at 10:55 PM, Martin Davis wrote:

> Interesting that using ST_Simplify makes such a big difference.
> 
> Would it be possible for you to post the sizes of the target geometries before and after simplification/buffering?  


What do you mean by "size"? If you mean "number of points," the original geometry (North America continent) had circa 50K points, and the simplified geometry had around 2000, so an order of magnitude difference. I am speaking from memory because I am not in front of my db server. Will check again tomorrow, and repost if the numbers are drastically different from above.


> This might help in determining a better solution for the future.
> 
> On 12/21/2011 12:02 PM, Puneet Kishor wrote:
>> Thanks everyone. Learned a lot. For now, I have decided to not go down the chop-and-reconstruct path. The following query does the job in 1.8 secs which is a pretty nice improvement over the previous 200+ seconds.
>> 
>> SELECT Count(*)
>> FROM collections c
>> WHERE ST_Intersects(
>> 	c.the_geom, (
>> 		SELECT ST_Buffer(ST_Simplify(n.the_geom, 0.75), 0.75)
>> 		FROM base.continents n
>> 		WHERE n.abbrev = 'NA'
>> 	)
>> );
>> 
>> 
>> On Dec 21, 2011, at 11:38 AM, pcreso at pcreso.com wrote:
>> 
>>> Puneet,
>>> 
>>> Chopping polygons is pretty simple, with a grid&  st_intersection(), but you can certainly generalise polygons to reduce the number of vertices&  size of objects to de-toast... beware however that if you do this then you are actually moving the polygon boundary,&  therefore a point very near a boundary may be inside the original country polygon but outside the generalised/simplified one.
>>> 
>>> You can address this by simplifying a buffer of the polygons, with the buffer very slightly larger than the simplify distance, so that every simplified version fully contains the original, but you will also have to check against the original polygons to confirm the point is genuinely inside the original.
>>> 
>>> As an alternative approach, you might also try selecting points where the distance from a polygon is zero, as the ST_distance uses stabbing line algorithm, and may be faster. The distance will be non-zero only for points outside the polygon.
>>> 
>>> Cheers,
>>> 
>>>   Brent Wood
>>> 
>>> On 21/12/2011, at 12:06 PM, Mr. Puneet Kishor wrote:
>>> 
>>>> On Dec 20, 2011, at 9:48 PM, Martin Davis wrote:
>>>> 
>>>>> For more detail check out this thread on the same issue:
>>>>> 
>>>>> http://postgis.refractions.net/pipermail/postgis-users/2011-November/031345.html
>>>> 
>>>> Thanks. Chopping up my coverage into hundreds of small regions is the last avenue I want to try. Going by the text of that email, it seems that "few, large, regions with many vertices (may be) the problem." I will try generalizing my continents so that I have "few, large regions with *very few* vertices" and see if that speeds up the SELECTs.
>>>> 
>>>> 
>>>>> On 12/20/2011 5:28 PM, Puneet Kishor wrote:
>>>>>> On Dec 20, 2011, at 7:21 PM, Paul Ramsey wrote:
>>>>>> 
>>>>>>> Chop up the continents into smaller pieces.
>>>>>>> 
>>>>>> hmmm... I am not sure I understand the above. And then what? UNION each smaller piece query?
>>>>>> 
>>>>>> 
>>>>>>> On Tue, Dec 20, 2011 at 4:35 PM, Puneet Kishor<punk.kish at gmail.com>   wrote:
>>>>>>>> This is probably a really basic question... my ST_Within or ST_Intersects selecting points in a continent are way too slow (both take upward of 200 secs).
>>>>>>>> 
>>>>>>>>        SELECT Count(c_id)
>>>>>>>>        FROM c, continents n
>>>>>>>>        WHERE ST_Intersects(c.the_geom, n.the_geom) AND
>>>>>>>>                n.continent = 'North America';
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Both tables have gist indexes on the geometries. The above query has the following plan
>>>>>>>> 
>>>>>>>> "Aggregate  (cost=9.66..9.67 rows=1 width=4)"
>>>>>>>> "  ->    Nested Loop  (cost=0.00..9.66 rows=1 width=4)"
>>>>>>>> "        Join Filter: _st_intersects(c.the_geom, n.the_geom)"
>>>>>>>> "        ->    Seq Scan on continents n  (cost=0.00..1.10 rows=1 width=32)"
>>>>>>>> "              Filter: ((continent)::text = 'North America'::text)"
>>>>>>>> "        ->    Index Scan using pbdb__collections_the_geom on collections c  (cost=0.00..8.30 rows=1 width=104)"
>>>>>>>> "              Index Cond: (c.the_geom&&   n.the_geom)"
>>>>>>>> 
>>>>>>>> The table c has approx 120K rows, and the continents table has 8 rows.Suggestions on how I can improve this? Yes, the computer is otherwise very swift and modern.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Puneet Kishor
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>> 
>> 
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 2012.0.1890 / Virus Database: 2109/4694 - Release Date: 12/21/11
>> 
>> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list