[postgis-devel] [postgis-users] Improvement suggestion

Paul Ramsey pramsey at cleverelephant.ca
Wed Dec 5 11:20:24 PST 2018


Forgot to include my reference [1]
https://commitfest.postgresql.org/21/1868/

On Wed, Dec 5, 2018 at 11:19 AM Paul Ramsey <pramsey at cleverelephant.ca>
wrote:

>
>
> On Tue, Dec 4, 2018 at 2:27 AM Darafei "Komяpa" Praliaskouski <
> me at komzpa.net> wrote:
>
>> If you are managing large geometries, splitting them off into "geometry
>> table" with non-uniquie ID and subdivided parts sometimes helps.
>> You want to build a tree on top of your geometry internals to make it all
>> fast, one way to do that is to make sure the internal parts of geometry are
>> available for indexing in GiST, via ST_Subdivide.
>>
>> Other beautiful way would be to hide this all behind the scenes in
>> PostGIS itself and making ST_Intersection and ST_Intersects optimize the
>> cases you mentioned internally, caching some kind of tree internally,
>> probably in the geometry itself.
>>
>
> The efficient handling of ultra-large geometries, even in a caching case,
> is going to involve some core changes to how PostGIS deals with PostgreSQL.
> First, we need to make checking the relevance of the cache cheaper. Right
> now, checking the cache involves comparing the entire contents of the
> current geometry (gserialized form) with the entire contents of the cached
> geometry (gserialized form), using a memcmp. The memcmp itself isn't so
> expensive, but fully reading in the current geometry (gserialized) *is*
> expensive, as the whole thing has to be de-toasted. This gets into the
> decompression issues for toasted values, which we can address both by
> patching PgSQL [1] and by changing up our own serialization to use
> uncompressed storage and compressing the things we want to compress
> ourselves (leaving an uncompressed header, for example). Once we are able
> to read back only a part of a large geometry, it becomes possible to use a
> hashcode in the header to test whether the current cache is still valid,
> and that aspect of ultra-large geometry reading gets better.
>
> One aspect that doesn't get better is the selectivity of the bbox of the
> ultralarge geometry. This is something that subdividing neatly fixes, at
> the same time as it dodges the toasting problem. If you aren't subdividing
> then your ultralarge geometry will probably have an extremely
> over-determined bounds, and so you will be testing more inputs against the
> full geometry than you really want to. Even with a nice, efficient, cached
> tree to do the testing with, the penalty of all those unnecessary tests
> builds up.
>
> The way around that is to start looking at using inverted indexes and
> multi-key coverages of polygons, which looks a lot like the way people with
> key/value stores do spatial indexing. There are some implementation issues
> there, particularly with geometry, that doesn't have a nice, implicit
> coordinate bounds for any given collection of features (as opposed to
> geography, which always lives inside (-180,-90,180,90). There's also some
> limitations with respect to the current PgSQL implementation which we might
> want to address, most notably the use of 32 bit keys in the GIN index. If
> we take a bit away for indicating key containment vs overlaps, we're left
> with only 31 bits, which in geography space is a about metre resolution (if
> memory serves). Not survey grade, and maybe not suitable for all purposes.
>
> Note that many of these issues can be worked around very easily (amazingly
> easily, in my opinion) by modelling with a subdivided query table, given
> orders of magnitude better performance without any infrastructural changes
> to PostGIS *or* PgSQL. The power of homogeneous inputs is hard to
> overstate, and the difficulty of dealing efficiently with the full
> heterogeneous range of spatial data inputs is hard to understate.
>
> P
>
>
>
>>
>> On Mon, Dec 3, 2018 at 11:43 PM Paul van der Linden <
>> paul.doskabouter at gmail.com> wrote:
>>
>>> No, didn't do that.
>>> Don't think it's going to improve readability of the query to be honest
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>>
>>
>> --
>> Darafei Praliaskouski
>> Support me: http://patreon.com/komzpa
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20181205/c2b5cd7f/attachment.html>


More information about the postgis-devel mailing list