[pdal] [Filters] Decimation built into a reader

Andrew Bell andrew.bell.ia at gmail.com
Fri Jun 16 08:28:58 PDT 2017


On Fri, Jun 16, 2017 at 7:19 AM, Albert Godfrind <albert.godfrind at oracle.com
> wrote:

> Ah OK. 6GB of LAZ would probably be equivalent to 75GB of LAS files. Still
> not a lot of data. Definitely something I would expect to process on a
> regular laptop without effort ...
>
> I’ve witnessed RAM usage of more than 38 GB (intensive swap usage) on my
> own dev box (8 cores, 16 GB RAM, SSD), and the process could not finish
> because it ran out of memory. So extrapolating to the target machine, this
> is unbearable.
>
>
> Looks like one of the processes in the pipeline requires loading the
> entire (uncompressed) dataset in memory ? Someone in the PDAL dev team may
> comment here. If so, then that is a bit of a worry regarding scalability ...
>

There are many algorithms that require all the points to be loaded because
they care about point locality.  If you need to know the nearest neighbors
of point X, and you have no idea where in the input dataset(s) the neighbor
points are, you have to load all the points or read the data multiple
times.  If you're doing heavy processing like this, it's not unreasonable
to purchase hardware to support it.  You can get a server with 512GB of ram
for less than $2500, I think.  Many people tile their data in order to work
around algorithmic limitations when working with very large datasets.

If points are organized spatially, algorithms can take advantage to
minimize memory usage and speed processing, but these algorithms are
specialized for particular tasks and the arrangement of the data.  This is
a lot of work ($), and when hardware is cheap, I'm not sure how valuable it
is unless you're doing lots and lots of this kind of thing.  Also, each
algorithm may have differing requirements on data arrangement for optimal
handling.  LAStools triangulation, for example, makes assumptions about the
data and does at least three passes of the input in order to reduce memory
requirements.  It runs quickly, but it's very specific code for a particlar
purpose.  Even then, the benefits decay when moving from 2D to 3D.

PDAL aims to be generic.  If you're interested in pursuing algorithm
development that reduces memory requirements, it's something that we can
work on with funding, but I'm not sure where it sits on our priority list
otherwise.

-- 
Andrew Bell
andrew.bell.ia at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170616/557300d3/attachment.html>


More information about the pdal mailing list