[postgis-devel] Review of TWKB spec

Nicklas Avén nicklas.aven at jordogskog.no
Wed Jun 19 12:50:24 PDT 2013


Thanks a lot Even

See comments inline

> * I would drop the bit for endianness. Just state that the endianness
> is little endian, and let the few big endian hosts in the wild do the
> byte swapping. This will make implementation of the spec easier.

Sounds great to me. I just copied the WKB here.

> * use the bit that has been saved above to indicate if the geometries
> have an id or not. There are certainly use cases where we don't need
> the id, or where it is transported by other means.

Why not
Another idea I have thought about is adding an extra whole byte. In that
byte 3 bits would give the byte number of the ID. If no ID, the number
of bytes is 0. 3 other bits of that byte would be used for the same
purpose for the first vertex instead of always use 4 bytes.

> * instead of using UINT32 that is always 4 byte long, I would rather
> use the Google Protocol Buffer way of encoding integers. That way if
> the number is lesser than 128, it will fit on a single byte. See
> https://developers.google.com/protocol-buffers/docs/encoding#varints .

Yes, this is interesting

I didn't know about it until Oliver Tonnhofer pointed me to it here
https://github.com/nicklasaven/TWKB/issues/1

I have though some about it, and I think it might be the right way to
go, but I have two things I think we should consider.
1)	From the tests I have done with twkb, if it is used for display
purpose, it is often enough information in 1 byte for the delta-value.
If for instance using meter based projection and using whole meter
precision the delta value is very often between -127 meters and + 127
meters. Then only 1 byte is used. If using varint the ratio for 1 byte
will be only -63 to + 63 and I have a feeling that will quite often give
2 byte usage. But this I don't know.
2)	I suspect that reading the varint in javascript will use more
resources. For the client it is now quite straight forward. With varint
and zigzag-encoding there seems to be a few steps before you have got
the INT to use in canvas or wherever. It would be interesting to compare
performance between the two ways of doing it. What I have found is that
mobile devices are very much slower to parse twkb than even quite old
laptops. Maybe varint even makes it faster, but I suspect it might get
slower. 

I think mobile devices is an important target since twkb should be
slimmed and good to cache in spatialite or some other structure to use
offline in the field.

> 
> * for polygons and multipolygons, you could save a few bytes by
> specifying that the last vertex of a ring isn't serialized. It is left
> to the decoder to add it to properly close the ring.
Yes, you are absolutely right. 

> 
> * typos in paragraph "Type 24, MultiGometryCollection (with individual
> id)",
> 
> - MultiGometryCollection --> MultiGeometryCollection
> 
> - you don't mean "UINT32 npolygons a 4 byte integer holding number of
> polygons", but probably "UINT32 ngeometries a 4 byte integer holding
> number of geometries".
> 
> - Instead of MultiGeometryCollection, I would call it
> "GeometryCollection (with individual id)" and would allow any geometry
> in it.

Great, I will fix them

> 
> * As we live in a binary world, it might perhaps be better to use
> 2^precision scaling instead of 10^precision. Someone smart could
> probably avoid any floating point operation by having fun with the
> IEEE754 representation of floating point numbers and bit shift
> operations.

This I know to little about, but it sounds smart :-)
As I understand everything is floats in javascript world, even integers.
But I do not know if that means all calulations is floating point
operations? Anyway, this format shall of course be designed for more
targets than javascript. 
> 
> * Delta value encoding : use GPB varint. This would also enable you to
> support easily 64bit or 128bit deltas when they sometimes occur. That
> would remove the need for method nr 1.
It is probably the right choice. Just want to discuss the arguments
raised above.

Another approach I have thought about is using the extra bits in the
byte telling the new size. If redesigning that byte it could look like
this:
3 bits -> tells number of bytes in coming deltavalues (per axis as in
method 2)
5 bits -> Tells for how many vertex-points the new value is valid. If
valid for more than 32 vertex-points it should give 0 and the parser
instead searches for this min-value as it works now. This way a lot of
the overhead gets reduced when the size is changing a lot. The most
expensive overhead is when decreasing size from bigger sizes. You need 4
bytes to flag a change from INT32. This way increasing to INT32 for 4
vertex-points only costs 2 bytes in total. 1 byte up and 1 byte down. 

But maybe things just gets too complicated and it is better to go for
something tested like varint. If it isn't slower to parse on client side
I think that is the right way. server side froom PostGIS I don't think
it will be any measurable slower. 

> 
> --> The idea of using GPB actually comes from the PBF encoding of OSM
> data : http://wiki.openstreetmap.org/wiki/PBF_Format

Aha, so it is not a google idea :-)

> 
> Suggestion if you want more visibility: advertize your spec on
> standards at lists.osgeo.org

Absolutely, maybe that is the right way for this type of discussions.

Thanks for your interest. Now there is three brains involved, including
Oliver Tonnhofer mentioned above. Some important points about usage for
webGL was also raised here:
https://groups.google.com/forum/#!msg/ol3-dev/B5CATNLTIb0/DPv-jqv2T_wJ
>From that I understand that webGL maybe not is the first target.

Do you or someone else know if it should be applied a license for
something like a format to show the development is open and community
driven? And how to make it community driven? How to do decisions about
the design? 

Should I raise questions like that at that standard-list?

I think this is a little thrilling if maybe it could be a good format in
the end.

Thanks

Nicklas

> 
> Best regards,
> 
> Even
> 
> -- 
> 
> Geospatial professional services
> 
> http://even.rouault.free.fr/services.html
> 





More information about the postgis-devel mailing list