[postgis-users] High Concurrency R* in GiST?

Mon Dec 5 13:54:13 PST 2011

On Mon, Dec 5, 2011 at 2:09 PM, Paul Ramsey <pramsey at opengeo.org> wrote:

> On Mon, Dec 5, 2011 at 11:05 AM, C. Mundi <cmundi at gmail.com> wrote:
>
> > I get the impression that GiST hides a lot of
> > implementation details.  So I am hungry for details which will help me
> > assess postGIS/postgreSQL for my application.
>
> This is the key point, and it is so: the physical implementation
> details are hidden behind the GiST API. As a result the R-Tree
> implementation is a "standard" one, not an R* (though the split method
> in Ang/Tan not Guttman). And as a result you can't do things like
> rebalance the tree as specified in the R* recipe. The GiST API really
> is quite narrow. You have the consistent function to control reads and
> the compress/picksplit controlling writes.
>
> So if you're looking for optimal tree construction you've come to the
> wrong place. The primary benefit of the PostGIS indexing system is not
> it's optimal nature but its existence: it's already here, you can
> insert and query data with simple SQL, it does do locking and
> consistent operations thanks to the postgresql infrastructure wrapped
> around it.
>
> As an architect my recommendation would be: since the development
> overhead in building your system from scratch will be quite high,
> investing the time into a load test on PostGIS first could save you a
> lot of time if it turns out that even our imperfect system is actually
> good enough to meet your needs.
>
> Best,
>
> P.
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>

Aha!  Paul, thank you very much.  This is a big part of what I need to
know, even though it's not exactly what I wanted to hear.  I already know
from testing that Ang/Tan splits can (not always but often enough to hurt
in a way I can measure in money) crush the performance for the natural
patterns of data+queries in the problem domain.  The fundamental reason is
that the "minor nuisance" of occasionally suboptimal splits is "leveraged"
across all compute cores.

So my new strategy is to live with the limits as you've suggested while
looking for a suitable implementation of R*.  Since I'm not a database
architect, any system I might build from scratch would be much worse than
expensive -- it would be broken!  So I'd love to know about any practical
R* implementations for any FOSS dB.  For that matter, I'd like to at least
learn of commercial ones at well.

Many thanks for sharing your time and considerable insight!

Best,
Carlos
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20111205/d8b930dc/attachment.html>