[postgis-users] High Concurrency R* in GiST?

Jochen Albrecht jochen.albrecht at gmail.com
Mon Dec 5 14:29:14 PST 2011


Have a look at http://www.cs.purdue.edu/spgist, Carlos. It was written for
a dated version of PostgreSQL but you given your background, it may be less
work that building something from scratch. It is a fairly low-level
implementation of Samet's 2006 bible and hence allows to derive R* trees
according to section 2.1.5.2 of the book.
Cheers,
    Jochen


On Mon, Dec 5, 2011 at 4:54 PM, C. Mundi <cmundi at gmail.com> wrote:

>
>
> On Mon, Dec 5, 2011 at 2:09 PM, Paul Ramsey <pramsey at opengeo.org> wrote:
>
>> On Mon, Dec 5, 2011 at 11:05 AM, C. Mundi <cmundi at gmail.com> wrote:
>>
>> > I get the impression that GiST hides a lot of
>> > implementation details.  So I am hungry for details which will help me
>> > assess postGIS/postgreSQL for my application.
>>
>> This is the key point, and it is so: the physical implementation
>> details are hidden behind the GiST API. As a result the R-Tree
>> implementation is a "standard" one, not an R* (though the split method
>> in Ang/Tan not Guttman). And as a result you can't do things like
>> rebalance the tree as specified in the R* recipe. The GiST API really
>> is quite narrow. You have the consistent function to control reads and
>> the compress/picksplit controlling writes.
>>
>> So if you're looking for optimal tree construction you've come to the
>> wrong place. The primary benefit of the PostGIS indexing system is not
>> it's optimal nature but its existence: it's already here, you can
>> insert and query data with simple SQL, it does do locking and
>> consistent operations thanks to the postgresql infrastructure wrapped
>> around it.
>>
>> As an architect my recommendation would be: since the development
>> overhead in building your system from scratch will be quite high,
>> investing the time into a load test on PostGIS first could save you a
>> lot of time if it turns out that even our imperfect system is actually
>> good enough to meet your needs.
>>
>> Best,
>>
>> P.
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>
>
> Aha!  Paul, thank you very much.  This is a big part of what I need to
> know, even though it's not exactly what I wanted to hear.  I already know
> from testing that Ang/Tan splits can (not always but often enough to hurt
> in a way I can measure in money) crush the performance for the natural
> patterns of data+queries in the problem domain.  The fundamental reason is
> that the "minor nuisance" of occasionally suboptimal splits is "leveraged"
> across all compute cores.
>
> So my new strategy is to live with the limits as you've suggested while
> looking for a suitable implementation of R*.  Since I'm not a database
> architect, any system I might build from scratch would be much worse than
> expensive -- it would be broken!  So I'd love to know about any practical
> R* implementations for any FOSS dB.  For that matter, I'd like to at least
> learn of commercial ones at well.
>
> Many thanks for sharing your time and considerable insight!
>
> Best,
> Carlos
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20111205/3c161ac0/attachment.html>


More information about the postgis-users mailing list