[postgis-users] Improvement suggestion

Wed Dec 5 11:19:25 PST 2018

On Tue, Dec 4, 2018 at 2:27 AM Darafei "Komяpa" Praliaskouski <me at komzpa.net>
wrote:

> If you are managing large geometries, splitting them off into "geometry
> table" with non-uniquie ID and subdivided parts sometimes helps.
> You want to build a tree on top of your geometry internals to make it all
> fast, one way to do that is to make sure the internal parts of geometry are
> available for indexing in GiST, via ST_Subdivide.
>
> Other beautiful way would be to hide this all behind the scenes in PostGIS
> itself and making ST_Intersection and ST_Intersects optimize the cases you
> mentioned internally, caching some kind of tree internally, probably in the
> geometry itself.
>

The efficient handling of ultra-large geometries, even in a caching case,
is going to involve some core changes to how PostGIS deals with PostgreSQL.
First, we need to make checking the relevance of the cache cheaper. Right
now, checking the cache involves comparing the entire contents of the
current geometry (gserialized form) with the entire contents of the cached
geometry (gserialized form), using a memcmp. The memcmp itself isn't so
expensive, but fully reading in the current geometry (gserialized) *is*
expensive, as the whole thing has to be de-toasted. This gets into the
decompression issues for toasted values, which we can address both by
patching PgSQL [1] and by changing up our own serialization to use
uncompressed storage and compressing the things we want to compress
ourselves (leaving an uncompressed header, for example). Once we are able
to read back only a part of a large geometry, it becomes possible to use a
hashcode in the header to test whether the current cache is still valid,
and that aspect of ultra-large geometry reading gets better.

One aspect that doesn't get better is the selectivity of the bbox of the
ultralarge geometry. This is something that subdividing neatly fixes, at
the same time as it dodges the toasting problem. If you aren't subdividing
then your ultralarge geometry will probably have an extremely
over-determined bounds, and so you will be testing more inputs against the
full geometry than you really want to. Even with a nice, efficient, cached
tree to do the testing with, the penalty of all those unnecessary tests
builds up.

The way around that is to start looking at using inverted indexes and
multi-key coverages of polygons, which looks a lot like the way people with
key/value stores do spatial indexing. There are some implementation issues
there, particularly with geometry, that doesn't have a nice, implicit
coordinate bounds for any given collection of features (as opposed to
geography, which always lives inside (-180,-90,180,90). There's also some
limitations with respect to the current PgSQL implementation which we might
want to address, most notably the use of 32 bit keys in the GIN index. If
we take a bit away for indicating key containment vs overlaps, we're left
with only 31 bits, which in geography space is a about metre resolution (if
memory serves). Not survey grade, and maybe not suitable for all purposes.

Note that many of these issues can be worked around very easily (amazingly
easily, in my opinion) by modelling with a subdivided query table, given
orders of magnitude better performance without any infrastructural changes
to PostGIS *or* PgSQL. The power of homogeneous inputs is hard to
overstate, and the difficulty of dealing efficiently with the full
heterogeneous range of spatial data inputs is hard to understate.

P

>
> On Mon, Dec 3, 2018 at 11:43 PM Paul van der Linden <
> paul.doskabouter at gmail.com> wrote:
>
>> No, didn't do that.
>> Don't think it's going to improve readability of the query to be honest
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
>
> --
> Darafei Praliaskouski
> Support me: http://patreon.com/komzpa
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20181205/cbd8492c/attachment.html>