[postgis-devel] TWKB_agg broken ?

Nicklas Avén nicklas.aven at jordogskog.no
Thu Aug 7 04:28:18 PDT 2014







2014-08-07 Sandro Santilli  wrote:

On Thu, Aug 07, 2014 at 01:09:54PM +0200, Nicklas Avén wrote:
>> Ah, Actually delta values will probabaly save you some space.
>> 
>> 
>> The coordinates is saved as signed varInt. 1 byte only reaches from -64 to 63.
>> 1 bit is lost in the varInt format signalizing if there is any more bytes
>> And since it is signed another byte is lost on the positive side.
>> 
>> 
>> So all values outside -64 - 63 will need 2 bytes in the ST_AsTWKB-version but in the agg version many of them will only use 1 byte if the delta value to the point before per axis is between -64 and 63
>
>Got it, but that's still a max of x2 size difference,
>while we're talking about a x3 size difference here.


No, it actually makes perfectly sense :-)
I didn't notice until now that all your points was that concentrated.


This is what you will get 


ST_AsTWKB per point
header 1 byte
type 1 byte
coordinates 2x2 bytes (if 2d)


ST_AsTWKBagg per point
coordinates 2x1 bytes (your bbox tells there will be no delta values exceeding 1 byte limit)




so, ST_AsTWKB will need 6 bytes per point and ST_AsTWKB will need 2 bytes per point + 3 bytes for header, type and npoints in the begining


If you use 5 decimals instead of 0 the difference will be much smaller.
But the twkbagg version can be tuned by ordering the inputted geometries so they are statistically more close to each other.


I have done some tests on real world data and I saved almost 30% by ordering on on of the axes like:


SELECT ST_asTwkbAgg(geom, 5) from 
(select geom from the_table order by ST_X(geom)) a;


But that of course depends a lot of the data set. 


/Nicklas







>
>--strk;
>
>> 
>> 
>> 2014-08-07 Sandro Santilli  wrote:
>> 
>> On Thu, Aug 07, 2014 at 10:44:40AM +0200, Nicklas Avén wrote:
>> >> 
>> >> 
>> >> 2014-08-07 Sandro Santilli  wrote:
>> >> 
>> >> On Wed, Aug 06, 2014 at 07:04:34PM +0200, Nicklas Avén wrote:
>> >> >> On on., 2014-08-06 at 18:40 +0200, Sandro Santilli wrote:
>> >> >> > Nicklas, I was looking at ST_AsTWKB_agg but I think it's broken.
>> >> >> > Why would otherwise the output of ST_AsTWKB_agg be smaller than
>> >> >> > the output of ST_AsTWKB ?
>> >> >> 
>> >> >> It is possible because the aggregate version uses delta values between
>> >> >> the points which often gives smaller values to store. 
>> >> >
>> >> >Is there no separator between records in the output of the aggregator ?
>> >> >How's it different from collecting the geometries and passing the
>> >> >collection to ST_AsTWKB ?
>> >> 
>> >> The differnce is that by using ST_AsTWKBagg you won't loose the individual id:s.
>> >> If you use ST_Collect it will just be a collection.
>> >
>> >I requested no IDs, the queries were:
>> >
>> > ST_AsTWKB   (g,0)
>> > ST_AsTWKBagg(g,0)
>> >
>> >The dataset I selected (10000 points) have this extent:
>> >
>> > BOX(-169.499567541294 -79.583924216684,-160.680307442788 -71.3369439435191)
>> >
>> >I specified 0 as number of digits.  This means that any absolute value
>> >also fits a single byte, right ?
>> >
>> >So, if IDs are absent in both and delta values fit within one byte in both...
>> >what else could make 10k points take 60k bytes with the non-aggregate
>> >and  20k bytes (1/3) with the aggregate call ?
>> >
>> >--strk;
>> >
>> >
>
>-- 
>
> ()  ASCII ribbon campaign  --  Keep it simple !
> /\  http://strk.keybit.net/rants/ascii_mails.txt  
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20140807/e85c6329/attachment.html>


More information about the postgis-devel mailing list