[pgpointcloud] RLE and SIGBITS heuristics

Mon Apr 20 09:03:46 PDT 2015

> Howard: About the Greyhoud+s3, but what storage solution do you use, it
is not clear...mongodb? I mean where are the points stored? file-based,
dbms?

The storage for iowalidar.com is S3 - but could be any key/value storage
system (filesystem, database, a back-end web server supporting PUT/GET,
etc.).

There is a single "base" chunk containing some well-known number (N) of
compressed points, which is stored as key "0".  After that, there is a
well-known chunk size (C).  So the next key after "0" is "N", containing C
compressed points, so the subsequent keys are stringified integers
following the form "N + C*x" for x >= 0.  The key for each chunk is the ID
of the first point of that chunk, and the value is the compressed binary
point data starting at that ID.

Currently the chunk size is determined by the ending level of the base
depth split into quadrants.  For example say the base contains depth levels
[0, 8), non-inclusive.  Then the chunk size will be level 8 of the 'tree'
split into 4 chunks, so (4^8)/4 points.  Those chunks contain the four
quadrants of the bounds of the entire set, at a single level of detail.
Continuing on with that chunk size creates a store where each subsequent tree
level is split into 4 times the number of chunks as the previous level.

>From there, given a point ID, we can easily figure out the ID of its chunk
and fetch it from S3 - typically the entire chunk will be used in a query
since the client is traversing its own virtual tree by splitting the bounds
and climbing upward in tree depth.  We are running Greyhound on an EC2
instance so the fetches from S3 are very fast.  The client specifies its
queries via a bounding box and a depth level (or range of levels), from
which we can, in parallel, fetch all chunks selected by this query and
start streaming out the points within range.  A client could also do a bit
more work and fetch directly by chunk ID, but we like the abstraction of
using a depth level and bounding box to decouple the physical storage from
the client queries.

- Connor

On Mon, Apr 20, 2015 at 9:03 AM, Oscar Martinez Rubi <
o.martinezrubi at tudelft.nl> wrote:

>  Hi,
>
> On 20-04-15 13:13, Rémi Cura wrote:
>
>  Hey Oscar,
>
>  I'm a really big fan of Lidar for archeological use, and integrating
> time into it is especially trendy and challenging. Registetring all point
> cloud together from different sources must have been really difficult.
>
>
> That is really tricky indeed! At NLeSC we worked on an automatic
> open-source alignment tool (
> https://github.com/NLeSC/PattyAnalytics/blob/master/scripts/registration.py)
> which works for some cases when aligning point clouds from archaeological
> monuments (from photogrametry) with a lidar dataset. For other cases we
> have a manual alignment tool that is a 3D desktop viewer using based on
> OpenSceneGraph (where also meshes and pictures can be displayed).
>
>
>
>  I contacted Potree developper one year ago to ask him if it was possible
> to modify it to read points in a DBMS (actually patch with LOD).
>  He said it was possible and not too difficult.
>  I don't know how much point output you get, but we demonstrated around
> 20kpts/s streaming to browser (with a lot of
> serialization/deserialization). Currently the upper limit for such output
> would be in the few hundred kpts/s if you send points, and in the few
> Million pts/s if you stream compressed patches.
>
>
> Currently we are getting around 200kpoints/sec using LAS format (not
> remember how much we got with LAZ) but we also have a no-so-good
> server...so I think same solution could give a bit more in other
> situations. Anyway if you say compressed patches in DB could deliver few
> millions/sec that should be more than enough! And would be nice to try!
>
>
>  Cheers
>
> Regards,
>
> O.
>
>
>
> 2015-04-20 11:58 GMT+02:00 Oscar Martinez Rubi <o.martinezrubi at tudelft.nl>
> :
>
>>  Whoa!
>>
>> Thanks guys for all the material! I am now busy reading it all!
>>
>> Remi: I had to read your mail a few times ;) Great slides (I actually
>> looked at all of them, very well done!) Very interesting topics you are
>> researching!
>>
>> Howard: About the Greyhoud+s3, but what storage solution do you use, it
>> is not clear...mongodb? I mean where are the points stored? file-based,
>> dbms?
>>
>> Paul+Nouri: The geohash tree that Noury mentions is the ght compression
>> in pgpointclouid, right? I tried it once but there was some limitation with
>> the type of coordinates, they had to be long and lat so I guess there need
>> to be a reference system transformation in between, right? Any place where
>> I can find an example on how to use this?
>>
>>
>> At NLeSC for the visualization of data we are using a system based on the
>> potree visualization (so, file-based) but I am very very interested on the
>> stuff you are guys doing and I would love to be convinced that DBMS
>> solutions can be really efficient for visualization as well (i think it is
>> close now!). We choose file-based and potree because of the initial lack of
>> LoD support in DBMS, the speed the file-based approach and the super
>> compressed LAZ storage.
>>
>> To see what we have done so far:
>>
>> https://github.com/NLeSC/PattyVis
>> https://www.esciencecenter.nl/project/mapping-the-via-appia-in-3d
>> (see the video from 1:40 for the potree based visualization)
>>
>> One of the many reasons I would loved to be convinced that DBMS is that
>> now we are considering how to visualize the 640B AHN2 dataset, and in a
>> pure file-based solution (like the potree) I fear that when restructuring
>> the data to octree we would need a number of octree nodes/files probably
>> larger than what ext4 can handle!. We will try, I let you know how that
>> goes ;), but it would be really nice to have a efficient and fast
>> DBMS-based alternative!
>>
>> I am very happy though with all the different work you are all doing and
>> excited to see how fast things improve and evolve!!
>>
>> Keep on like this guys!
>>
>> Regards,
>>
>> O.
>>
>>
>>
>> On 17-04-15 19:01, Sabo, Nouri wrote:
>>
>>  Hi,
>>
>> Thank you for sharing these ideas. Many of the ideas can make
>> improvements. In the prototype we have developed at RNCan and that we
>> mentioned in the paper in attachment we have implemented some of these
>> concepts. For example, in the prototype we are sorting points according to
>> the Morton pattern before creating blocks. And each block is composed
>> only of points that are spatially close, thereby improving the level of
>> compression. We also use the properties of the Morton curve (Z pattern) to
>> do spatial queries using Geohash as BBox. Usually, in Geohash based
>> system the more the Geohash prefixes for two points resemble one another,
>> the more they are spatially close to each other. Unfortunately, this
>> property is not always complied with two points located on either side of a
>> subdivision line. For this reason we implemented a neighbourhood based
>> strategy to allow spatial query based on the hash string.
>>
>> Also to improve the compression and performance we can change the
>> encoding of Geohash. Currently, the hashes are encoded as base 32
>> strings, which causes a lot of overhead (5 bits are inflated in 8 bits of
>> character). Unfortunately, the current libght does not include all the
>> concepts of GeoHashTree.
>>
>> Oscar, I will read your paper and get you back so we could continue to
>> exchange.
>>
>> Kind regards!
>>
>>
>>
>> Nouri,
>>
>>
>>
>>
>>
>> *From:* Paul Ramsey [mailto:pramsey at cleverelephant.ca
>> <pramsey at cleverelephant.ca>]
>> *Sent:* 17 avril 2015 06:56
>> *To:* pgpointcloud at lists.osgeo.org; Peter van Oosterom; Oscar Martinez
>> Rubi; Howard Butler; Rémi Cura
>> *Cc:* Sabo, Nouri
>> *Subject:* Re: [pgpointcloud] RLE and SIGBITS heuristics
>>
>>
>>
>> Hi Oscar,
>>
>> This sounds like a slightly more sophisticated version of the work done
>> at Natural Resources Canada for what they call “geohash tree”. They did
>> find that they got pretty good compression (even with the simple
>> ascii-based key!) using the scheme, and it did allow easy random access to
>> subsets of the data.
>>
>>
>>
>> http://2013.foss4g.org/conf/programme/presentations/60/
>>
>>
>>
>> The downside was of course the cost of sorting things in the first place,
>> but for a one-time cost on frequently accessed data, it’s not a bad thing.
>> The “libght” soft dependency in pgpointcloud is to a (not so great)
>> implementation of the scheme that I did for them a couple years ago. As a
>> scheme, I think it cuts against the idea of having small patches that is
>> core to the pgpointcloud concept. It makes more and more sense the larger
>> your file is, in that it gets greater and greater leverage for random
>> access.
>>
>> ATB,
>>
>> P.
>>
>>
>>
>> --
>> Paul Ramsey
>> http://cleverelephant.ca
>>
>> http://postgis.net
>>
>>
>>
>> On April 17, 2015 at 11:02:47 AM, Oscar Martinez Rubi (
>> o.martinezrubi at tudelft.nl) wrote:
>>
>>  Hi,
>>
>> About the XYZ binding for better compression. In our research in the NL
>> escience center and TU Delft we have been thinking (not testing yet though)
>> about one possible approach for this.
>>
>> It is based on using space filling curves. So, once you have the points
>> that go in a block you could compute the morton/hilbert code of the XYZ.
>> Since all the points are close together such codes will be extremely
>> similar, so one could store only the increments which could fit in many few
>> bits. We have not tested or compared this with any of the other
>> compressions but we just wanted to share it with you just in case you find
>> it useful!
>>
>> An additional improvement would be to sort the points within the blocks
>> according to the morton code. Then, when doing crop/filter operations in
>> the blocks one can use the morton codes for the queries similarly to what
>> we presented in our papers with the flat table (without blocks), I attach
>> one of them (see section 5.2). In a nutshell: You convert the query region
>> into a set of quadtree/octree nodes which can be also converted to morton
>> code ranges (thanks to relation between morton/hilbert curve and a
>> quadtree/octree). You scale down the ranges to increments (like you did
>> when storing the point of the block) and then you simply do range queries
>> in sorted data with a binary algorithm. In this way you avoid the
>> decompression of the morton code for most of the block. This filtering is
>> equivalent to a bbox filter so it still requires a point in polygon check
>> for some of the points.
>>
>> Kind Regards,
>>
>> Oscar.
>>
>>  On 16-04-15 18:15, Rémi Cura wrote:
>>
>>  epic fail ! I had avoided html just for you
>>
>>
>>    Dataset   |subset size  | compressing   | decompressing |
>>              |(Million pts)|(Million pts/s)|(Million pts/s)|
>> Lidar        |   473.3     |    4,49       |     4,67      |
>>
>> 21-atributes |   105.7     |    1,11       |     2,62      |
>>
>> Stereo       |    70       |    2,44       |     7,38      |
>>
>> Cheers
>>
>>
>>
>> 2015-04-16 17:42 GMT+02:00 Sandro Santilli <strk at keybit.net>:
>>
>> On Thu, Apr 16, 2015 at 05:30:12PM +0200, Rémi Cura wrote:
>> > OUps
>> >
>> > Dataset        |  subset size(Million pts) | compressing (Million
>> pts/s) |
>> > decompressing (Million pts/s)
>> > Lidar           |            473.3                |               4,49
>> >               |             __4,67__
>> > 21 attributes |           105.7                 |
>> > 1,11                     |             2,62
>> > Stereo         |              70                  |                2,44
>> >                |             7,38
>>
>> These tables aren't really readable here.
>> Could you make sure to use a fixed-width font to write those tables
>> and to keep lines within 70 columns at most ?
>>
>> --strk;
>>
>>
>>
>>
>>
>>
>>  _______________________________________________
>>
>> pgpointcloud mailing list
>>
>> pgpointcloud at lists.osgeo.org
>>
>> http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud
>>
>>
>>  ------------------------------
>>
>>
>>
>
>
> _______________________________________________
> pgpointcloud mailing list
> pgpointcloud at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20150420/9b97b44a/attachment-0001.html>