[postgis-users] Space partitioning index for points

Tue Jul 6 11:30:23 PDT 2010

postgis-users-bounces at postgis.refractions.net wrote on 07/02/2010 12:26:43 
AM:

> GeoHash doesn't cut it? It's space partitioned, just not as balanced
> as a kd-tree, but for points... no? I'll look up my info on the GiST
> work that was done, but note that it didn't make it into 9.0, but into
> 9.1, and as I recall Oleg and co did not seem to think the kd
> implementation was going to be very effective.

At this point I'm just considering solutions on the conceptual level to 
identify promising candidates. GeoHash is promising. However, my issue 
with GeoHash is that it linearizes a 2d problem, much like space-filling 
curves. (i.e., you never know how far away a "close" geohash-value is; 
conversely: identifying all the "near" geohashes presents an issue). 
Besides: GeoHash is global and our study is regional. 2/3 or more of the 
hash values will be empty, while the others will be chock full of points. 
Also, our region extends from 20-80 degrees N. GeoHashes aren't a 
constant-distance metric, they're constant-angle. We're going to have fat 
cells toward the south and skinny ones in the north.

The most conceptually pleasing approach for this proximity-based function 
would be to operate on the kd tree itself, minimizing the number of 
searches. At most, a "proximity" search from a known point in the tree may 
have to pop up a couple of levels (to check the neighbors), and the _same_ 
proximity search could be re-used for all nodes under the same subtree. 
Directly navigating a spatial index which retains the spatial identity of 
the indexed points seems like the most efficient way to do this. 

I guess in database terms, I'm talking about implementing a new "cursor" 
which leverages a hierchical structure like the kd-tree (to walk thru the 
points using the proximity information in the tree). Simultaneously: 
implementing a caching proximity search on the tree itself which can 
re-use previous searches from the same subtree. That way, proximity 
searches from points under the same subtree needn't be repeated.

This is perhaps not a topic for the users list. If I flesh the idea out 
more, I'll use the wiki. We've been granted a reprieve for the moment by 
selecting a "case study" which needs only 3 million out of 90 million 
points. However, we're eventually going to have to suck it up and process 
everything. So we've got a bit of time to mull it over; and/or review the 
literature for smart ways to use a kd-tree/other scheme.

Bryce 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20100706/313e29b6/attachment.html>