[postgis-users] Space partitioning index for points
Bryce L Nordgren
bnordgren at fs.fed.us
Tue Jul 6 11:30:23 PDT 2010
postgis-users-bounces at postgis.refractions.net wrote on 07/02/2010 12:26:43
> GeoHash doesn't cut it? It's space partitioned, just not as balanced
> as a kd-tree, but for points... no? I'll look up my info on the GiST
> work that was done, but note that it didn't make it into 9.0, but into
> 9.1, and as I recall Oleg and co did not seem to think the kd
> implementation was going to be very effective.
At this point I'm just considering solutions on the conceptual level to
identify promising candidates. GeoHash is promising. However, my issue
with GeoHash is that it linearizes a 2d problem, much like space-filling
curves. (i.e., you never know how far away a "close" geohash-value is;
conversely: identifying all the "near" geohashes presents an issue).
Besides: GeoHash is global and our study is regional. 2/3 or more of the
hash values will be empty, while the others will be chock full of points.
Also, our region extends from 20-80 degrees N. GeoHashes aren't a
constant-distance metric, they're constant-angle. We're going to have fat
cells toward the south and skinny ones in the north.
The most conceptually pleasing approach for this proximity-based function
would be to operate on the kd tree itself, minimizing the number of
searches. At most, a "proximity" search from a known point in the tree may
have to pop up a couple of levels (to check the neighbors), and the _same_
proximity search could be re-used for all nodes under the same subtree.
Directly navigating a spatial index which retains the spatial identity of
the indexed points seems like the most efficient way to do this.
I guess in database terms, I'm talking about implementing a new "cursor"
which leverages a hierchical structure like the kd-tree (to walk thru the
points using the proximity information in the tree). Simultaneously:
implementing a caching proximity search on the tree itself which can
re-use previous searches from the same subtree. That way, proximity
searches from points under the same subtree needn't be repeated.
This is perhaps not a topic for the users list. If I flesh the idea out
more, I'll use the wiki. We've been granted a reprieve for the moment by
selecting a "case study" which needs only 3 million out of 90 million
points. However, we're eventually going to have to suck it up and process
everything. So we've got a bit of time to mull it over; and/or review the
literature for smart ways to use a kd-tree/other scheme.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the postgis-users