[postgis-devel] Re: [postgis-users] GIST index speed
Mark Cave-Ayland
mark.cave-ayland at siriusit.co.uk
Thu Jun 19 09:30:59 PDT 2008
Tom Lane wrote:
> Datum? No. Datum has to be a scalar type --- turning it into a
> struct would break an impossibly large amount of code, and the
> efficiency implications aren't pleasant either.
>
> However, it does seem like a mechanism that's able to handle caching
> of detoasting operations might be able to solve your problem too,
> if it's got a reasonably flexible notion of what detoasting means.
> I'll try to keep your issue in mind while working on that.
Okay. The reason I was thinking about Datum was because I was thinking
about trying to cache the results of the PostGIS -> GEOS transformation
for non-TOASTed as well as TOASTed objects. So effectively we'd need to
keep both the PostgreSQL geometry type and the GEOS-converted PostgreSQL
geometry type around where we can access them. With your suggestion, I'm
guessing that we would only be able to cache the PostGIS -> GEOS
transformation for TOASTed objects. Maybe that would be enough, although
it's one of those things that would be difficult to test without an
actual implementation :(
> Did you see the alternative design sketch at
> http://archives.postgresql.org/pgsql-hackers/2008-06/msg00777.php
> ? I'm trying to decide which way to pursue. The second proposal
> would have a lot less management overhead, but would not be able
> to cache detoast work across queries, which is something that might
> be critical for your case. Or not --- I'm not very clear on the
> use pattern you foresee. Datum certainly wouldn't have a cross-query
> lifespan either.
Yes, I did take a look at it, although it's beginning to get outside of
my sphere of knowledge. Since geometries can get quite big, the most
common query pattern is accessing TOASTed geometries within a correlated
subquery, or as part of an (inner) JOIN using the && operator against
two columns containing TOASTed geometries in different tables.
Note that because geometries contain floating point numbers and are
multi-dimensional, the only real way to perform the join is using a
Nested Loop. Most of the slow queries I see are of this form; so it may
be that a cache simply for the duration of a single query would be
sufficient for the majority of cases.
ATB,
Mark.
--
Mark Cave-Ayland
Sirius Corporation - The Open Source Experts
http://www.siriusit.co.uk
T: +44 870 608 0063
More information about the postgis-devel
mailing list