[postgis-devel] Review of TWKB spec

Even Rouault even.rouault at mines-paris.org
Thu Jun 20 12:06:14 PDT 2013


> I didn't know about it until Oliver Tonnhofer pointed me to it here
> https://github.com/nicklasaven/TWKB/issues/1
> 
> I have though some about it, and I think it might be the right way to
> go, but I have two things I think we should consider.
> 1)	From the tests I have done with twkb, if it is used for display
> purpose, it is often enough information in 1 byte for the delta-value.
> If for instance using meter based projection and using whole meter
> precision the delta value is very often between -127 meters and + 127
> meters. Then only 1 byte is used. If using varint the ratio for 1 byte
> will be only -63 to + 63 and I have a feeling that will quite often give
> 2 byte usage. But this I don't know.

That might be true, but by using varints you save the need to switch between 
sizes. Only benchmarking on real world data could tell which is the most 
efficient in practice.

> 2)	I suspect that reading the varint in javascript will use more
> resources. For the client it is now quite straight forward. With varint
> and zigzag-encoding there seems to be a few steps before you have got
> the INT to use in canvas or wherever. It would be interesting to compare
> performance between the two ways of doing it. What I have found is that
> mobile devices are very much slower to parse twkb than even quite old
> laptops. Maybe varint even makes it faster, but I suspect it might get
> slower.

Yes, possibly...

For reference, here's the code I've used in the OGR OSM driver to decode 
signed integers

static GIntBig ReadVarInt64(GByte** ppabyData)
{
    GIntBig nVal = 0;
    int nShift = 0;
    GByte* pabyData = *ppabyData;

    while(TRUE)
    {
        int nByte = *pabyData;
        if (!(nByte & 0x80))
        {
            *ppabyData = pabyData + 1;
            return nVal | ((GIntBig)nByte << nShift);
        }
        nVal |= ((GIntBig)(nByte & 0x7f)) << nShift;
        pabyData ++;
        nShift += 7;
    }
}

static GIntBig ReadVarSInt64(GByte** ppabyData)
{
    GIntBig nVal = ReadVarInt64(ppabyData);
    /* un-zig-zag-ging */
    if ((nVal & 1) == 0) 
        return (((GUIntBig)nVal) >> 1);
    else
        return -(GIntBig)(((GUIntBig)nVal) >> 1)-1;
}

> 
> I think mobile devices is an important target since twkb should be
> slimmed and good to cache in spatialite or some other structure to use
> offline in the field.

The performance on mobile devices depends both on the transfer time from the 
server and the CPU power on the client to parse the geometry blob. But both 
bandwith and CPU power are increasing with time, so at some point one might 
wonder if plain WKB will not just do it... From reading the TWKB spec, I had 
that the impression that the spec was assuming that the limiting factor was 
bandwith. But obviously there's also a trade-off with decompression time... Not 
easy to find the good balance.

While you are mentionning spatialite, it has also support for compressed 
geometries, but in a more simple fashion.

> 
> > * As we live in a binary world, it might perhaps be better to use
> > 2^precision scaling instead of 10^precision. Someone smart could
> > probably avoid any floating point operation by having fun with the
> > IEEE754 representation of floating point numbers and bit shift
> > operations.
> 
> This I know to little about, but it sounds smart :-)
> As I understand everything is floats in javascript world, even integers.
> But I do not know if that means all calulations is floating point
> operations? Anyway, this format shall of course be designed for more
> targets than javascript.

My remark about 2^precision was just some vague intuition. By using a non 
power of 2, I think you perhaps loose some fraction of bits of the mantissa 
during the scaling. But with FPUs, floating point operations might be faster 
than doing integer operations on the IEEE754 representation.

> 
> > --> The idea of using GPB actually comes from the PBF encoding of OSM
> > data : http://wiki.openstreetmap.org/wiki/PBF_Format
> 
> Aha, so it is not a google idea :-)

No, I wanted to mean that the guys that designed PBF got the idea from the GPB 
encoding.

> Do you or someone else know if it should be applied a license for
> something like a format to show the development is open and community
> driven?

You can perhaps takes inspiration for the "Licence" chapter at the bottom of 
https://github.com/mapbox/mbtiles-spec . Basically you can have a licence for 
the text of the spec itself, and a sentence to describe which use can be made 
of the text.

> And how to make it community driven? How to do decisions about
> the design?

A similar example might be how the development of GeoJSON took place. I guess 
the story started with a few people gathering around a beer with a vague idea. 
Someone wrote a draft, and the others one amended it. At some time, people are 
happy with the draft and organize a vote to approve it and tag it v1.0 ...

> 
> Should I raise questions like that at that standard-list?
> 
> I think this is a little thrilling if maybe it could be a good format in
> the end.
> 
> Thanks
> 
> Nicklas
> 
> > Best regards,
> > 
> > Even

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html



More information about the postgis-devel mailing list