[postgis-users] ST_Distance_Sphere too slow

Stephen Woodbridge woodbri at swoodbridge.com
Thu Mar 24 21:44:07 PDT 2011


On 3/24/2011 7:35 PM, Julian Perelli wrote:
> 2011/3/24 Stephen Woodbridge<woodbri at swoodbridge.com>:
>> On 3/24/2011 6:14 PM, Julian Perelli wrote:
>>>
>>> Thanks!
>>>
>>> I don't know if st_expand is faster than st_distance_sphere but with
>>> EXPLAIN I saw 2 seq scan, and with your approach I see now one seq
>>> scan, and one index scan, so I suppose your approach is much better,
>>> and I think that it couldn't be better.
>>>
>>> Could you explain me why that happens? why now is a index scan where
>>> before was a seq scan?
>>
>> Spatial indexes only work on the bbox of the geometry and the&&  is try if
>> the two geometry bboxes interact with one another.
>>
>> So if you hav geometry A and B then A&&  B compares their bboxes. But in
>> your case you want to find any two geometries that might be as far apart as
>> 300 meters. If A and B are 300 meters apart then their bboxes will not
>> interact and you will never be able to select them for computing the
>> distance. But if we expand the bbox of one of the geometries by 300 meters,
>> then it will interact with the bbox in question and we will check the
>> distance. If the distance is greater then 300m we will throw it out anyway.
>
> Ok, I understand now.&&  works with bboxes, and st_expand doesn't have
> almost any cost.

Right, st_expand only grows the bbox size, so no cost in doing that.

>>
>> So&&  is a fast select using indexes and bboxes.
>> The distance is not performed on all the pairs that are obviously too far
>> apart, and since this is an expense computation this is good.
>>
>> Performance is based more on the false index matches that get rejected by
>> the more costly distance calculation. If there is not a high percentage of
>> these then index searchs are extremely fast.>
>> -Steve
>
> Unfortunately, I have too many overlapping paths, but you are giving
> me here an idea:
>
> I can first see what paths cross each other, and that (I suppose) Is
> faster than the distance calculation. Then with the rest of the
> results I can take the distance. (I don't know if the internals on
> ST_distance do that already, then my idea is useless :P) I'll try it,
> post the code and say how it goes.

No this will not work, intersects is more costly than distance. Are you 
paths, straight line segments or linstrings with multiple vertices? Are 
you paths fairly random or mostly parallel at a mostly common angle?

45 degrees lines have a large bbox, but horizontal and vertical lines 
have small bboxes. If you rotate your geometry so most of the paths are 
horizontal or vertical and save it in a temp table and index it, it 
would then probably run much faster even given building the temp table.

-Steve

> Thanks!
>>
>>> I made a subset table (only 10 paths!!) to test.
>>> with your SQL it takes 2,8 sec
>>> with mine it takes 5,0 sec to complete.
>>>
>>> with a subset of 100 paths I let it for an hour and it doesn't finished.
>>> the problem is that it's almost unacceptable, with 1000 paths it will
>>> take eternity!
>>> (I think it takes exponential time as the subset grows up because of
>>> the self join)
>>>
>>> maybe is there another way to do this faster... I dont know...
>>>
>>> I want to run this just once, to make a table with this temporary
>>> results for a larger query in my application, so if it takes 12 hours
>>> to complete this operation, is ok.
>>>
>>> 2011/3/24 Stephen Woodbridge<woodbri at swoodbridge.com>:
>>>>
>>>> I would make sure there is an gist index on path and use a query like:
>>>>
>>>>   select
>>>>     t1.id,
>>>>     t2.id
>>>>   from
>>>>     table t1
>>>>       inner join table t2
>>>>         on (t1.path&&    st_expand(t2.path, 300/111120) and t1.id!=t2.id and
>>>> ST_Distance_Sphere(t1.path, t2.path)<    300);
>>>>
>>>> the 300/111120 is to convert 300 meters into approximately degrees. If
>>>> you
>>>> are worried about some near misses you can expand it a little more and
>>>> distance_sphere will filter extras out of the results.
>>>>
>>>> -Steve W
>>>>
>>>> On 3/24/2011 3:43 PM, Julian Perelli wrote:
>>>>>
>>>>> Hello Postgis List!
>>>>>
>>>>> I'm trying to get the pair of paths that crosses each other or are 300
>>>>> meters or less distant from each other, making this SQL.
>>>>>
>>>>>    select
>>>>>      t1.id,
>>>>>      t2.id
>>>>>    from
>>>>>      table t1
>>>>>        inner join table t2
>>>>>          on (t1.id!=t2.id and ST_Distance_Sphere(t1.path, t2.path)<
>>>>>   300);
>>>>>
>>>>> it was 14 hours running and it doesn't finish... I have 1200+ rows in
>>>>> the table, each path has between 100 and 500 points.
>>>>>
>>>>> I tried to make an index on the path column, but when I use explain on
>>>>> the query, it seems that pg doesn't use the index.
>>>>>
>>>>> should I increase the memory assigned to pgsql?
>>>>>
>>>>> I don't know where to begin, what to do, to make this query faster.
>>>>> Maybe I have an error and it just hangs up.
>>>>> It would be nice to know how to debug the query.. to see it running or
>>>>> something like that. EXPLAIN helps, but not too much.
>>>>>
>>>>> Regards!
>>>>> _______________________________________________
>>>>> postgis-users mailing list
>>>>> postgis-users at postgis.refractions.net
>>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>




More information about the postgis-users mailing list