[postgis-devel] An Experiment

Mon Sep 21 20:32:46 PDT 2009

Not that I know anything about what any of you are talking about, but it
sounded similar to what Simon was talking about with Morton Keys.

So thought I would list it here in case it has relevance

http://www.spatialdbadvisor.com/oracle_spatial_tips_tricks/138/spatial-sorti
ng-of-data-via-morton-key/

Thanks,
Regina

-----Original Message-----
From: postgis-devel-bounces at postgis.refractions.net
[mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of Paul
Ramsey
Sent: Monday, September 21, 2009 7:42 PM
To: PostGIS Development Discussion
Subject: Re: [postgis-devel] An Experiment

This book

http://www.amazon.com/Foundations-Multidimensional-Structures-Kaufmann-Compu
ter/dp/0123694469

has a great little section on r-trees which gives capsule summaries of
various papers, and even better summaries of some of the empirical testing
that was done on different r-tree variants. Among other things, they found
that the Ang/Tan linear algorithm actually produced a better index than the
Guttman quadratic time algorithm, which is sort of counter-intuitive, but
c'est la vie.

And it also shows some visualizations which sure make one want to have an
R*Tree. However, the GiST infrastructure just doesn't provide that option.
So we're left with really very few knobs to twist (penalty and picksplit).

However, perhaps we are wasting our time :) there was a very very
interesting talk at GeoWeb this year by some guys selling a RDF
triple-store, about how they did their spatial index. And they basically did
a one-level grid system, but even simpler, just encoding things in strips,
and it performed, because the strips could be read in the fastest possible
way off the disk: linearly in one direction.

In a similar vein, the QIX structure used by Mapserver has some really
really dumb behavior for points, creating an absurdly deep tree by default,
and yet, doesn't really suffer from it. Our trouble is, we don't have a
solid enough handle on where our performance bottlenecks are, so we could
very well be focussing on the wrong (but very
interesting) thing by looking at, for example, tree structure instead of
disk access.

Another interesting performance test would be r-tree versus
b-tree-over-geohash. The latter is more naive and un-balanced, but has the
advantage (when clustered) of being nicely spatially autocorrelated, thereby
providing potentially better disk access profiles for our classic bounding
box query.

P.

On Mon, Sep 21, 2009 at 4:19 PM, Chris Hodgson <chodgson at refractions.net>
wrote:
> Hey Paul, it was actually almost a year ago that I looked into this 
> stuff. I started a blog to publish my findings but I realized that a 
> solution was probably outside the scope of postgresql/postgis and so I 
> didn't end up going any further with it. Also (as I'm sure many other 
> wannabe bloggers have discovered) I haven't done anything interesting 
> enough to blog about since :)
>
> http://cognitivelychris.blogspot.com/
>
> (I originally had it on chris.ca but I ended up selling that domain.)
>
> Anyways, I explain in there why the real problem is progressive 
> insertion of *variable density data*. It doesn't matter whether you 
> order it randomly or semi-geo-correllated - unless it is just perfect 
> (ie. you pre-process it with a magic process that knows about how the 
> index will be built) you end up with a relatively poor index because 
> the dense areas effectively take over the index and mess it up.
>
> I came up with some interesting algorithms to make what I envisioned 
> to be nearly perfectly balanced r-tree indexes but they require 
> sorting the bounding boxes for all of the data in the table at once, 
> which isn't necessarily feasible with large data sets and is 
> definitely outside of the postgresql workflow of inserting one record 
> at a time. I'll try to make another post with some of my prettier
pictures.
>
> Chris
>
> Paul Ramsey wrote:
>>
>> Kevin,
>>
>> Many many moons ago, you and Chris showed me the visual 
>> representation of an index, and lo, it had a damn funny layout 
>> compared to what we expected.
>>
>> Having now gone through the picksplit code with a fine comb, I'm 
>> pretty sure I know why. I don't have any general means of preventing 
>> that (an R*Tree would be awesome, but GiST lacks the infrastructure 
>> to build one, at this time) I do have an observation it would be fun 
>> to see tested.
>>
>> First, note that spatial data tends to be delivered to databases in 
>> shape files, and that the data in those files has a tendency to be 
>> spatially autocorrelated. (Watch the draw in JUMP or ArcView.) 
>> Spatially autocorrelated data is pretty much guaranteed to generate 
>> screwed up R-Trees, since the top nodes are going to be built around 
>> one small area, then expanded in odd ways to fit new data as is fills 
>> in areas outside the original node boundary.
>>
>> A better R-Tree is going to be build against randomized inputs. So, 
>> try two tables, one loaded and built the old fashioned way, another 
>> built by creating a second table with "create table random_mytable as 
>> select * from mytable order by random()". Obviously you'll need 
>> enough records in the tables to make it interesting (bearing in mind 
>> there are hundreds of entries in each index node).
>>
>> If you get a chance to run those, I'd love to hear how it goes :)
>>
>> Paul
>>
>> PS - Of course, once the random-based index is built, clustering the 
>> table against the index to bring the tuples back into some kind of 
>> spatial correlation will be necessary for performance purposes.
>> Probably all these minor fine-tuning things make ****-all difference 
>> in the big picture. But it's fun to fiddle.
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
_______________________________________________
postgis-devel mailing list
postgis-devel at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-devel