[postgis-users] High Concurrency R* in GiST?

Mon Dec 5 14:47:07 PST 2011

Thank you very much, Jochen.  This can at least teach me how much trouble
I'm getting into!

Cheers,
Carlos
 On Dec 5, 2011 3:29 PM, "Jochen Albrecht" <jochen.albrecht at gmail.com>
wrote:

>
> Have a look at http://www.cs.purdue.edu/spgist, Carlos. It was written
> for a dated version of PostgreSQL but you given your background, it may be
> less work that building something from scratch. It is a fairly low-level
> implementation of Samet's 2006 bible and hence allows to derive R* trees
> according to section 2.1.5.2 of the book.
> Cheers,
>     Jochen
>
>
> On Mon, Dec 5, 2011 at 4:54 PM, C. Mundi <cmundi at gmail.com> wrote:
>
>>
>>
>> On Mon, Dec 5, 2011 at 2:09 PM, Paul Ramsey <pramsey at opengeo.org> wrote:
>>
>>> On Mon, Dec 5, 2011 at 11:05 AM, C. Mundi <cmundi at gmail.com> wrote:
>>>
>>> > I get the impression that GiST hides a lot of
>>> > implementation details.  So I am hungry for details which will help me
>>> > assess postGIS/postgreSQL for my application.
>>>
>>> This is the key point, and it is so: the physical implementation
>>> details are hidden behind the GiST API. As a result the R-Tree
>>> implementation is a "standard" one, not an R* (though the split method
>>> in Ang/Tan not Guttman). And as a result you can't do things like
>>> rebalance the tree as specified in the R* recipe. The GiST API really
>>> is quite narrow. You have the consistent function to control reads and
>>> the compress/picksplit controlling writes.
>>>
>>> So if you're looking for optimal tree construction you've come to the
>>> wrong place. The primary benefit of the PostGIS indexing system is not
>>> it's optimal nature but its existence: it's already here, you can
>>> insert and query data with simple SQL, it does do locking and
>>> consistent operations thanks to the postgresql infrastructure wrapped
>>> around it.
>>>
>>> As an architect my recommendation would be: since the development
>>> overhead in building your system from scratch will be quite high,
>>> investing the time into a load test on PostGIS first could save you a
>>> lot of time if it turns out that even our imperfect system is actually
>>> good enough to meet your needs.
>>>
>>> Best,
>>>
>>> P.
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at postgis.refractions.net
>>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>>
>>
>>
>> Aha!  Paul, thank you very much.  This is a big part of what I need to
>> know, even though it's not exactly what I wanted to hear.  I already know
>> from testing that Ang/Tan splits can (not always but often enough to hurt
>> in a way I can measure in money) crush the performance for the natural
>> patterns of data+queries in the problem domain.  The fundamental reason is
>> that the "minor nuisance" of occasionally suboptimal splits is "leveraged"
>> across all compute cores.
>>
>> So my new strategy is to live with the limits as you've suggested while
>> looking for a suitable implementation of R*.  Since I'm not a database
>> architect, any system I might build from scratch would be much worse than
>> expensive -- it would be broken!  So I'd love to know about any practical
>> R* implementations for any FOSS dB.  For that matter, I'd like to at least
>> learn of commercial ones at well.
>>
>> Many thanks for sharing your time and considerable insight!
>>
>> Best,
>> Carlos
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20111205/7b548e98/attachment.html>