[postgis-devel] gserialized-v2

Fri Jun 21 15:01:54 PDT 2019

So, the PR finally builds and regresses, and includes the initial
scaffolding on which new features using new flagging can be built.

https://github.com/postgis/postgis/pull/421

In order to get there, you'll note some structural changes around the code base

- All access to gserialized internals is now behind an API that lives
in gserialized.c
- The homology between gserialized->flags and lwgeom->flags is now
broken -- gserialized2 has an 8-bit gflags member, and an optional
64-bit xflags area managed behind the api, while lwgeom->flags has
been expanded to 16 bits for now, with room potentially to get to 24
with some extra contortions as necessary in the future
- That means you can continue to call FLAGS_SET/GET macros on
lwgeom->flags, but you should no longer assume you can do so on
gserialized->gflags, you should instead call the relevant
gserialized_get_*() functions from the API
- (All the above changes have been cleaned out of the code base, so
this in on a going forward basis, there is no work to be done by
anyone to implement the above, it's already done)
- lwgeom_from_gserialized() will magically work regardless of whether
it's fed a v1 or v2 serialization, so this code all works fine when
slapped on top of a database full of old geometries
- gserialized_from_lwgeom() (and the two functions riding on top of
it, geometry_serialize() and geography_serialize(), which are
preferred when calling from inside ./postgis) will write out v2
geometries, so databases will slowly get re-written with the new form
over time

So far all this deck chair rearranging has resulted in no net-new
functionality. I'm looking at two potential areas where the new
serialization could be put to use before postgis3:

- The extra flag space on gflags that has been freed up by moving the
IsSolid flag into the xflags area could be used of an IsPoint flag, to
allow a lightweight point type. A lightweight x/y point could then be
24 bytes (varsize + srid/flags + x + y) instead of 32 bytes (varsize +
srid/flags + type + padding + x + y), for a 25% savings for simple
points.
- An extended flag and optional data area to hold a hash code could be
used to accelerate the prepared geometry cache for those cases where
the cached object is large, avoiding much of the current repetitive
decompression overhead.
- Some kind of validity caching could be put into place for larger
objects using a ValidChecked/IsValid pair of flags in the extended
area. open question for me would be -- when to set those flags, what
triggers it? obviously ST_MakeValid() could set it, but what about
automatically for any large polygons during serialization?

I apologize in advance for the fundamental ugliness of the work, I'm
not an artist, I'm a tradesman.

ATB,
P