[postgis-users] Calculating the Distance between Points

j.boochs at gmx.de j.boochs at gmx.de
Thu Nov 22 04:46:57 PST 2007


Hi, 

i want to calculate the distance between points using postgis. 
Here are the problems that I´ve discovered: 

"I want to get all points from TAB_B, that are in a range of m meteres from some points in TAB_A." 
distance() works only for one SRID for all points. transform() has to be used for the projection to one UTM zone. 

Index creation has to be done for that one UTM zone. If the data spans more than one UTM zone, problems start to arise. 
For each UTM zone an index in each table has to be created. Data has to be dissected. Queries covering more than one UTM zone are not possible. 

A work around is to use one SRID and tranform all points to that SRID although this seems to lead to distortions. 

My question is: "Is there a better solution to use distance() for multiple UTM zones than the work around to use one SRID an transform all points to that SRID"?

Thank you
Jens


Here is the detailed description of my problem: 

I have two tables where my points are in: TAB_A and TAB_B. 
All my coordinates are returned from a geocoder (for example google) in WGS84 (SRID = 4326). 
So the tables are created like this: 

create table TAB_A ( 
                  col1... 
                  col2... 
); 
select addgeometrycolumn ('schema_name', TAB_A), 
'coord', 4326,'POINT', 2); 

TAB_B is created like TAB_A. 

I want to get all points from TAB_B, that are in a range of m meteres from some points in TAB_A. So the first try was to do something like this: 

select * from TAB_B b, TAB_A a 
where a.col1 =... and a.col2 like... 
and distance(a.coord,b.coord) < m 

That´s not working, because the points have to be projected. Here are some notes, I found in the postgis mailing archives:

>>any kind of distance calculations are going to be much easier in a >>projected cs like UTM 
>>if you could transform to a non-degree projection, that would be a lot >>more efficient 
>>the best way is to reproject them in a projected coordinate system and >>then to measure the distance. 
>>projecting your data to a suitable meter based SRID using the transform >>function  
>>As we've said before, you need to find a projected coordinate system -- >>that is one where the sphere has been converted to a >>flat surface.

>>distance_sphere and distance_spheroid are *much* slower operations than >>distance 
>>use distance() of distance_sphere() over distance_spheroid() as the are >>fast in that order 

I´m dealing with coordinates within Germany so UTM zone 32N and 33N are relevant. Most of Germany is located within 32N, so SRID = 32632 is the one to choose. So this leads to the second try:

select * from TAB_B b, TAB_A a 
where a.col1 =... and a.col2 like... 
and distance(transform(a.coord,32632),transform(b.coord,32632)) < m 

However, I don´t know how big the error is for coordinates that belong to 33N, but are converted to 32N. And for bigger countries like the US it even get´s worse. That´s once again what I found in the postgis mailing archives:

>>Are you in one area that is covered by the same UTM zone or across the >>country? 
>>This appears to suggest that i should pick an SRID based on one point in >>my data set.  However, since the data set is national, 
>>it includes points from all around the USA (i.e. at least 4 UTM zones). >>So, if I pick the SRID for, say, Mountain Time, will 
>>that horribly distort points on the east and/or west coast?  is there no >>SRID that is perhaps less accurate in any one time 
>>zone, but more accurate across all time zones in the [continental] US? 

Ok, than I found a function that gives the SRID for the UTM zone from a point: 
>>http://wiki.postgis.org/support/wiki/index.php?plpgsqlfunctions has a >>function that can be used to say what utm zone a lat/lon >>is in 

After extending postgis with this utmzone() function, the next try was: 

select * from TAB_B b, TAB_A a 
where a.col1 =... and a.col2 like... 
and distance(transform(a.coord,utmzone(a.coord)),transform(b.coord,utmzone(b.coord))) < m 

But that´s not working, because the distance() function doesn´t accept geometries with different SRIDs. That´s what I found in the mailing archive about this: 
>>ERROR:  Operation on two GEOMETRIES with different 
>>PostGIS does some sanity checks to ensure you're not operating on >>different SRIDs (which would be meaningless). 

Dissecting my data into 33N and 32N would be an other try. But than I can only query for points that are in the same zone.

I wouldn´t be able to include points from two zones in one query. Here´s a comment from the mailing archive on this: 
>>What area does your data cover?  Does it cover the whole world or just a 
>>limited region like a country?  If you can transform to a meter based 
>>projection, that would be the most efficient.  That won't work 
>>unfortunately if you need to cover the whole world.  In which case you 
>>may want to dissect your data into different quadrants and project each 
>>separately. 

Further more, when it comes to performance and index optimization other problems arise with distance() together with transform().

Using && and expand() from the postigs tutorial for the above problem results in: 

select * from TAB_B b, TAB_A a 
where a.col1 =... and a.col2 like... 
and distance(transform(a.coord,32632),transform(b.coord,32632)) < m 
and expand(transform(a.coord,32632)) && transform(b.coord,32632) 
and transform(a.coord,32632)) && expand(transform(b.coord,32632)) 
and expand(transform(a.coord,32632)) && expand(transform(b.coord,32632)) 

Points form TAB_A and TAB_B are expanded with a bounding box, to quickly reduce the result set and reduce the number of calls to distance(). 

I don´t know how the index is used internally so my question is: 
Has the usage of two bounding boxes "expand(transform(a.coord,32632)) && expand(transform(b.coord,32632))" an impact on the usage of indexes, i.e. is this and-condition reducing the result set?

Great everything works, but the index is not used. The solution is, that e an index for the projected / transformed version has to be used. Here is a note on this issue:

>>This is just a guess, but your query is probably not using the index you 
>>created.  You should check this using explain, but if you created the 
>>index on the original column and not the transformed version, it won't 
>>be able to use the index.  If it's not using the index, try creating an 
>>index like this... 

Ok, creating the index for the transformed version goes like this: 
create index myIndex_TAB_A_ForUtm_32N; 
                on TAB_A 
                using gist(transform(coord,32632)) 
create index myIndex_TAB_B_ForUtm_32N; 
                on TAB_B 
                using gist(transform(coord,32632)) 

With that solution I´m limited to one SRID. Indexes have to be created exactly for that SRID. But it works explain shows the an index scan is used.


So putting it all together for distance() and coming back to the main question: 
"I want to get all points from TAB_B, that are in a range of m meteres from some points in TAB_A." 

distance() works only for one SRID for all points. transform() has to be used for the projection to one UTM zone. Index creation has to be done exactly for that one UTM zone. 

If the data spans more than one UTM zone problems start to arise: 
For each UTM zone an index in each table has to be created. 
Data has to be dissected. Queries covering more than one UTM zone are not possible. 

A work around is to use one SRID and tranform all points to that SRID although this seems to lead to distortions. I'm not satisfied with that solution. My question is: "Is there a better solution to use distance() for multiple UTM zones than the work around to use one SRID an transform all points to that SRID"?

-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser



More information about the postgis-users mailing list