[postgis-devel] Re: [postgis-users] GIST index speed

Thu Jun 19 09:30:59 PDT 2008

Tom Lane wrote:

> Datum?  No.  Datum has to be a scalar type --- turning it into a
> struct would break an impossibly large amount of code, and the
> efficiency implications aren't pleasant either.
> 
> However, it does seem like a mechanism that's able to handle caching
> of detoasting operations might be able to solve your problem too,
> if it's got a reasonably flexible notion of what detoasting means.
> I'll try to keep your issue in mind while working on that.

Okay. The reason I was thinking about Datum was because I was thinking 
about trying to cache the results of the PostGIS -> GEOS transformation 
for non-TOASTed as well as TOASTed objects. So effectively we'd need to 
keep both the PostgreSQL geometry type and the GEOS-converted PostgreSQL 
geometry type around where we can access them. With your suggestion, I'm 
guessing that we would only be able to cache the PostGIS -> GEOS 
transformation for TOASTed objects. Maybe that would be enough, although 
it's one of those things that would be difficult to test without an 
actual implementation :(

> Did you see the alternative design sketch at
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
> ?  I'm trying to decide which way to pursue.  The second proposal
> would have a lot less management overhead, but would not be able
> to cache detoast work across queries, which is something that might
> be critical for your case.  Or not --- I'm not very clear on the
> use pattern you foresee.  Datum certainly wouldn't have a cross-query
> lifespan either.

Yes, I did take a look at it, although it's beginning to get outside of 
my sphere of knowledge. Since geometries can get quite big, the most 
common query pattern is accessing TOASTed geometries within a correlated 
subquery, or as part of an (inner) JOIN using the && operator against 
two columns containing TOASTed geometries in different tables.

Note that because geometries contain floating point numbers and are 
multi-dimensional, the only real way to perform the join is using a 
Nested Loop. Most of the slow queries I see are of this form; so it may 
be that a cache simply for the duration of a single query would be 
sufficient for the majority of cases.

ATB,

Mark.

-- 
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063