[postgis-devel] Validity flag

Thu Nov 15 10:01:28 PST 2018

> On Nov 15, 2018, at 8:11 AM, Sandro Santilli <strk at kbt.io> wrote:
> 
> On Thu, Nov 15, 2018 at 04:26:56PM +0100, Hugo Mercier wrote:
>> 
>> 
>> On 15/11/2018 11:55, Sandro Santilli wrote:
>> 
>>> 
>>> I was asking just because I'm pretty sure we need more flagspace
>>> for introducing validity. And we need NOT to consume all flags,
>>> so that we can use the last available flag for specifying there
>>> are more flags following it.
>>> 
>> 
>> Are you suggesting a variable-length flags in postgis 3.0 header ?
>> Each flag byte would have one bit that says "wait there is more" ?
> 
> Yes, also known as the "extension bit”

So, this might be something we no longer care about, but the serialization we have was designed to land the double arrays on alignment boundaries, so it’s possible to directly read the values without copying them out into aligned storage. This may or may not have been a “big win”, since intel architectures allow unaligned reads (with a slight performance penalty) and most of our architectures are intel, but some architectures like RISC do not allow unaligned reads, so copying is required.

For large objects that end up compressed this is all kind of moot, and if we go to an internal compression scheme in gserialized3 then there’s a copy step anyways and the coordinate destination can be an aligned array. For small objects it’s possible we’ve been getting a win maintaining alignment.

All this to say, optional header components almost *have* to add 8 bytes to a stored object, so we’d want to ensure that any flags going there are relatively rare or special case, not common. Or, we go back to copy-on-deserialize everywhere, as in PostGIS 1.x, and have a slimmer, more optionalized, serialization. With the use of expanded object headers in PgSQL, the overhead of copy-on-deserialize can be reduced in function(chaining(cases())) but the one-function-call overhead of it will be unavoidable.

I am interested in doing a new serialization but there are many things to balance and a lot of decisions that we should probably actually *measure* this time, as past decisions made in the “gut” (for example, using main storage over extended) have turned out to be counter-intuitively wrong. 

So, I’m happy to have this conversation, but I don’t want people to lift up their tools and start hacking next week… how about we start by writing out some proposals in text documents? See for example the old https://github.com/postgis/postgis/blob/svn-trunk/liblwgeom/g_serialized.txt <https://github.com/postgis/postgis/blob/svn-trunk/liblwgeom/g_serialized.txt>

P

> 
> --strk;
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20181115/dd61a847/attachment.html>