[postgis-users] ST_DWithin on 2 columns which are the same

Tue Mar 1 18:23:46 PST 2011

Robert,

On 01/03/2011, at 4:47 PM, robertvc wrote:

> I have been trying, with no luck, to implement a query to return me all the
> pairs of rows that are within a certain range of each other. I searched the
> forum before trying to find a solution but I haven't been able to. Sorry if
> there is a solution to this problem already that I didn't see. 

I think you need to search on nearest neighbour.

> Suppose you have a 2 column table cars with an integer ID, and a geography
> Location representing the position of the car globally. I want to write a
> query that will return me pairs of IDs of cars that are within say 1km of
> each other. Having looked at the functions available ST_DWithin seems the
> obvious choice but I haven't been able to actually use it for what I want. 
> 
> I've started out by simply testing if it matches each car as being in range
> with itself : 
> 
> SELECT * FROM cars WHERE ST_DWithin(location, location, 1);
> 
> This returned all the entries from the cars table as expected (given that no
> cars where actually within a meter of each other).

This doesn't sound like my interpretation - it returned all the entries because each car's location was within 1 metre of itself (unsurprisingly). 

> I then tried to find all
> the cars that are within a km of each other but don't have the same ID (to
> avoid matching a car with itself). Because of this extra constraint I need
> to somehow treat the location columns individually so I've tried the
> following: 
> 
> SELECT c1.id, c2.id FROM cars AS c1, cars AS c2 WHERE
> ST_DWithin(c1.location, c2.location, 1000) AND c1.id != c2.id; 
> 
> But this query never actually finishes computing (the number of entries in
> my cars table is around 30k and after 2 hours of the query being executed I
> still didn't get back a result). I would greatly appreciate any help in
> computing this query as well as any tips on performance. I should also
> probably mention that I did an indexing on the location column as suggested
> in the documentation: 
> 
> CREATE INDEX cars_gix ON cars USING GIST (location);

Did you also VACUUM ANALYZE after creating the index?

Performance-wise, using geometry instead of geography might help. You could try EXPLAIN to see where the slow point in the query is.

How widespread are the cars - are you talking about 30 000 cars that might all be within 1 km of each other?

Given that the direction of the relation doesn't matter maybe use an outer join something like 

SELECT c1.id, c2.id 
FROM cars c1
LEFT OUTER JOIN cars c2 ON ST_DWithin(c1.location, c2.location, 1000)
AND c2 IS NOT NULL AND c1.id <> c2.id;

(I have no idea if that would work!)

cheers

Ben