[postgis-users] ST_Distance_Sphere too slow

Julian Perelli jperelli at gmail.com
Thu Mar 24 16:35:31 PDT 2011


2011/3/24 Stephen Woodbridge <woodbri at swoodbridge.com>:
> On 3/24/2011 6:14 PM, Julian Perelli wrote:
>>
>> Thanks!
>>
>> I don't know if st_expand is faster than st_distance_sphere but with
>> EXPLAIN I saw 2 seq scan, and with your approach I see now one seq
>> scan, and one index scan, so I suppose your approach is much better,
>> and I think that it couldn't be better.
>>
>> Could you explain me why that happens? why now is a index scan where
>> before was a seq scan?
>
> Spatial indexes only work on the bbox of the geometry and the && is try if
> the two geometry bboxes interact with one another.
>
> So if you hav geometry A and B then A && B compares their bboxes. But in
> your case you want to find any two geometries that might be as far apart as
> 300 meters. If A and B are 300 meters apart then their bboxes will not
> interact and you will never be able to select them for computing the
> distance. But if we expand the bbox of one of the geometries by 300 meters,
> then it will interact with the bbox in question and we will check the
> distance. If the distance is greater then 300m we will throw it out anyway.

Ok, I understand now. && works with bboxes, and st_expand doesn't have
almost any cost.

>
> So && is a fast select using indexes and bboxes.
> The distance is not performed on all the pairs that are obviously too far
> apart, and since this is an expense computation this is good.
>
> Performance is based more on the false index matches that get rejected by
> the more costly distance calculation. If there is not a high percentage of
> these then index searchs are extremely fast.>
> -Steve

Unfortunately, I have too many overlapping paths, but you are giving
me here an idea:

I can first see what paths cross each other, and that (I suppose) Is
faster than the distance calculation. Then with the rest of the
results I can take the distance. (I don't know if the internals on
ST_distance do that already, then my idea is useless :P) I'll try it,
post the code and say how it goes.

Thanks!
>
>> I made a subset table (only 10 paths!!) to test.
>> with your SQL it takes 2,8 sec
>> with mine it takes 5,0 sec to complete.
>>
>> with a subset of 100 paths I let it for an hour and it doesn't finished.
>> the problem is that it's almost unacceptable, with 1000 paths it will
>> take eternity!
>> (I think it takes exponential time as the subset grows up because of
>> the self join)
>>
>> maybe is there another way to do this faster... I dont know...
>>
>> I want to run this just once, to make a table with this temporary
>> results for a larger query in my application, so if it takes 12 hours
>> to complete this operation, is ok.
>>
>> 2011/3/24 Stephen Woodbridge<woodbri at swoodbridge.com>:
>>>
>>> I would make sure there is an gist index on path and use a query like:
>>>
>>>  select
>>>    t1.id,
>>>    t2.id
>>>  from
>>>    table t1
>>>      inner join table t2
>>>        on (t1.path&&  st_expand(t2.path, 300/111120) and t1.id!=t2.id and
>>> ST_Distance_Sphere(t1.path, t2.path)<  300);
>>>
>>> the 300/111120 is to convert 300 meters into approximately degrees. If
>>> you
>>> are worried about some near misses you can expand it a little more and
>>> distance_sphere will filter extras out of the results.
>>>
>>> -Steve W
>>>
>>> On 3/24/2011 3:43 PM, Julian Perelli wrote:
>>>>
>>>> Hello Postgis List!
>>>>
>>>> I'm trying to get the pair of paths that crosses each other or are 300
>>>> meters or less distant from each other, making this SQL.
>>>>
>>>>   select
>>>>     t1.id,
>>>>     t2.id
>>>>   from
>>>>     table t1
>>>>       inner join table t2
>>>>         on (t1.id!=t2.id and ST_Distance_Sphere(t1.path, t2.path)<
>>>>  300);
>>>>
>>>> it was 14 hours running and it doesn't finish... I have 1200+ rows in
>>>> the table, each path has between 100 and 500 points.
>>>>
>>>> I tried to make an index on the path column, but when I use explain on
>>>> the query, it seems that pg doesn't use the index.
>>>>
>>>> should I increase the memory assigned to pgsql?
>>>>
>>>> I don't know where to begin, what to do, to make this query faster.
>>>> Maybe I have an error and it just hangs up.
>>>> It would be nice to know how to debug the query.. to see it running or
>>>> something like that. EXPLAIN helps, but not too much.
>>>>
>>>> Regards!
>>>> _______________________________________________
>>>> postgis-users mailing list
>>>> postgis-users at postgis.refractions.net
>>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list