[pgpointcloud] RLE and SIGBITS heuristics
Sabo, Nouri
Nouri.Sabo at RNCan-NRCan.gc.ca
Fri Apr 17 10:01:03 PDT 2015
Thank you for sharing these ideas. Many of the ideas can make improvements. In the prototype we have developed at RNCan and that we mentioned in the paper in attachment we have implemented some of these concepts. For example, in the prototype we are sorting points according to the Morton pattern before creating blocks. And each block is composed only of points that are spatially close, thereby improving the level of compression. We also use the properties of the Morton curve (Z pattern) to do spatial queries using Geohash as BBox. Usually, in Geohash based system the more the Geohash prefixes for two points resemble one another, the more they are spatially close to each other. Unfortunately, this property is not always complied with two points located on either side of a subdivision line. For this reason we implemented a neighbourhood based strategy to allow spatial query based on the hash string.
Also to improve the compression and performance we can change the encoding of Geohash. Currently, the hashes are encoded as base 32 strings, which causes a lot of overhead (5 bits are inflated in 8 bits of character). Unfortunately, the current libght does not include all the concepts of GeoHashTree.
Oscar, I will read your paper and get you back so we could continue to exchange.
Kind regards!
From: Paul Ramsey [mailto:pramsey at cleverelephant.ca]
Sent: 17 avril 2015 06:56
To: pgpointcloud at lists.osgeo.org; Peter van Oosterom; Oscar Martinez Rubi; Howard Butler; Rémi Cura
Cc: Sabo, Nouri
Subject: Re: [pgpointcloud] RLE and SIGBITS heuristics
Hi Oscar,
This sounds like a slightly more sophisticated version of the work done at Natural Resources Canada for what they call “geohash tree”. They did find that they got pretty good compression (even with the simple ascii-based key!) using the scheme, and it did allow easy random access to subsets of the data.
The downside was of course the cost of sorting things in the first place, but for a one-time cost on frequently accessed data, it’s not a bad thing. The “libght” soft dependency in pgpointcloud is to a (not so great) implementation of the scheme that I did for them a couple years ago. As a scheme, I think it cuts against the idea of having small patches that is core to the pgpointcloud concept. It makes more and more sense the larger your file is, in that it gets greater and greater leverage for random access.
Paul Ramsey
On April 17, 2015 at 11:02:47 AM, Oscar Martinez Rubi (o.martinezrubi at tudelft.nl<mailto:o.martinezrubi at tudelft.nl>) wrote:
About the XYZ binding for better compression. In our research in the NL escience center and TU Delft we have been thinking (not testing yet though) about one possible approach for this.
It is based on using space filling curves. So, once you have the points that go in a block you could compute the morton/hilbert code of the XYZ. Since all the points are close together such codes will be extremely similar, so one could store only the increments which could fit in many few bits. We have not tested or compared this with any of the other compressions but we just wanted to share it with you just in case you find it useful!
An additional improvement would be to sort the points within the blocks according to the morton code. Then, when doing crop/filter operations in the blocks one can use the morton codes for the queries similarly to what we presented in our papers with the flat table (without blocks), I attach one of them (see section 5.2). In a nutshell: You convert the query region into a set of quadtree/octree nodes which can be also converted to morton code ranges (thanks to relation between morton/hilbert curve and a quadtree/octree). You scale down the ranges to increments (like you did when storing the point of the block) and then you simply do range queries in sorted data with a binary algorithm. In this way you avoid the decompression of the morton code for most of the block. This filtering is equivalent to a bbox filter so it still requires a point in polygon check for some of the points.
Kind Regards,
On 16-04-15 18:15, Rémi Cura wrote:
epic fail ! I had avoided html just for you
Dataset |subset size | compressing | decompressing |
|(Million pts)|(Million pts/s)|(Million pts/s)|
Lidar | 473.3 | 4,49 | 4,67 |
21-atributes | 105.7 | 1,11 | 2,62 |
Stereo | 70 | 2,44 | 7,38 |
2015-04-16 17:42 GMT+02:00 Sandro Santilli <strk at keybit.net<mailto:strk at keybit.net>>:
On Thu, Apr 16, 2015 at 05:30:12PM +0200, Rémi Cura wrote:
> OUps
> Dataset | subset size(Million pts) | compressing (Million pts/s) |
> decompressing (Million pts/s)
> Lidar | 473.3 | 4,49
> | __4,67__
> 21 attributes | 105.7 |
> 1,11 | 2,62
> Stereo | 70 | 2,44
> | 7,38
These tables aren't really readable here.
Could you make sure to use a fixed-width font to write those tables
and to keep lines within 70 columns at most ?
pgpointcloud mailing list
pgpointcloud at lists.osgeo.org<mailto:pgpointcloud at lists.osgeo.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20150417/74b17103/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: GeoHashTree-eng.pdf
Type: application/pdf
Size: 2862776 bytes
Desc: GeoHashTree-eng.pdf
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20150417/74b17103/attachment-0001.pdf>
More information about the pgpointcloud
mailing list