[postgis-devel] An Experiment

Mon Sep 21 14:43:03 PDT 2009

Kevin,

Many many moons ago, you and Chris showed me the visual representation
of an index, and lo, it had a damn funny layout compared to what we
expected.

Having now gone through the picksplit code with a fine comb, I'm
pretty sure I know why. I don't have any general means of preventing
that (an R*Tree would be awesome, but GiST lacks the infrastructure to
build one, at this time) I do have an observation it would be fun to
see tested.

First, note that spatial data tends to be delivered to databases in
shape files, and that the data in those files has a tendency to be
spatially autocorrelated. (Watch the draw in JUMP or ArcView.)
Spatially autocorrelated data is pretty much guaranteed to generate
screwed up R-Trees, since the top nodes are going to be built around
one small area, then expanded in odd ways to fit new data as is fills
in areas outside the original node boundary.

A better R-Tree is going to be build against randomized inputs. So,
try two tables, one loaded and built the old fashioned way, another
built by creating a second table with "create table random_mytable as
select * from mytable order by random()". Obviously you'll need enough
records in the tables to make it interesting (bearing in mind there
are hundreds of entries in each index node).

If you get a chance to run those, I'd love to hear how it goes :)

Paul

PS - Of course, once the random-based index is built, clustering the
table against the index to bring the tuples back into some kind of
spatial correlation will be necessary for performance purposes.
Probably all these minor fine-tuning things make ****-all difference
in the big picture. But it's fun to fiddle.