[pgpointcloud] RLE and SIGBITS heuristics

Rémi Cura remi.cura at gmail.com
Wed Apr 15 04:42:51 PDT 2015


Maybe you would be interested in paper from lastool about how they compress
the data
[the paper](http://lastools.org/download/laszip.pdf)

2015-04-15 13:34 GMT+02:00 Sandro Santilli <strk at keybit.net>:

> On Thu, Apr 09, 2015 at 12:41:32PM +0200, Sandro Santilli wrote:
> > Reading the code I found that SIGBITS is used when it gains
> > a compression ratio of 1.6:1 while RLE is required to get 4:1
> > ratio (but the comment talk about a 4:1 for both).
> >
> > How were the ratios decided ?
> >
> >
> https://github.com/pgpointcloud/pointcloud/blob/v0.1.0/lib/pc_dimstats.c#L121-L137
>
> As an experiment, I created a patch containing 260,000 points organized
> so that the data in each dimension is layed out in a way to make the
> heuristic
> pick one of the 3 different encodings:
>
>  All dimensions are of type int16_t
>  - 1st dimension alternates values 0 and 1
>  - 2nd dimension has value -32768 for the first 130k points, then 32767
>  - 3rd dimension alternates values -32768 and 32767
>
> Then I checked the size of the patch after applying different compression
> schemes, and here's what I got:
>
>    size   |          compression
>  ---------+---------------------------
>      1680 | {zlib,zlib,zlib}
>      4209 | {zlib,auto,auto} <-- zlib much better than sigbits !
>     33656 | {auto,zlib,auto} <-- zlib better than rle
>     36185 | {auto,auto,auto} <----- DEFAULT, effectively {sigbits,rle,zlib}
>     36185 | {auto,auto,zlib}
>   1072606 | {sigbits,sigbits,sigbits}
>   1560073 | {uncompressed}   <------ UNCOMPRESSED size
>   1563148 | {rle,rle,rle}
>
> Interesting enough, "zlib" results in a better compression than
> both "sigbits" and "rle", and we're supposedly talking about their
> best performances (only 2 runs for rle, 15 bits over 16 in common for
> sigbits).
>
> It might be a particularly lucky case for zlib too, given the very regular
> pattern of values distributions, but now I'm wondering... how would zlib
> perform on real world datasets out there ?
>
> If you're running the code from current master branch you could test this
> yourself with a query like this:
>
>   \set c mycol -- set to name of your column
>   \set t mytab -- set to name or your table
>   SELECT sum(pc_memsize(
>               pc_compress(:c, 'dimensional', array_to_string(
>                array_fill('zlib'::text,ARRAY[100]), ','
>               ))))::float8 /
>          sum(pc_memsize(:c))
>   FROM :t;
>
> It will tell how much smaller could your dataset get by compressing
> it all with dimensional/zlib schema.
>
> I get 0.046 with my test dataset above (260k points in patch),
> while it is 1.01 (bigger) on a dataset where patches have 1000 points each
> and 3 dimensions over 12 are already compressed with zlib, other 3 with
> sigbits and the remaining 6 with rle.
>
> How about you ?
>
> --strk;
> _______________________________________________
> pgpointcloud mailing list
> pgpointcloud at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20150415/10338f59/attachment.html>


More information about the pgpointcloud mailing list