[pdal] [Filters] Decimation built into a reader

GUIMMARA, Sébastien (External) sebastien.guimmara.external at airbus.com
Mon Jun 19 00:47:45 PDT 2017


Indeed, all the stages in my typical pipelines are not compatible with the  --stream option (outlier, normal…).

My constraint is that I should be able to process any dataset between 50 and 1000MB (in uncompressed form) on a moderately powerful desktop machine, not because we don’t have the processing power in servers, but essentially because the target machines could be used anywhere without any network connection.

I’m aiming to take this constraint into account by decimating the datasets until they fall into range of acceptable sizes. This decimation should be done in a preliminary pass with the --stream option.

Sébastien

De : Andrew Bell [mailto:andrew.bell.ia at gmail.com]
Envoyé : vendredi 16 juin 2017 17:29
À : Albert Godfrind
Cc : GUIMMARA, Sébastien (External); pdal
Objet : Re: [pdal] [Filters] Decimation built into a reader

On Fri, Jun 16, 2017 at 7:19 AM, Albert Godfrind <albert.godfrind at oracle.com<mailto:albert.godfrind at oracle.com>> wrote:
Ah OK. 6GB of LAZ would probably be equivalent to 75GB of LAS files. Still not a lot of data. Definitely something I would expect to process on a regular laptop without effort ...

I’ve witnessed RAM usage of more than 38 GB (intensive swap usage) on my own dev box (8 cores, 16 GB RAM, SSD), and the process could not finish because it ran out of memory. So extrapolating to the target machine, this is unbearable.

Looks like one of the processes in the pipeline requires loading the entire (uncompressed) dataset in memory ? Someone in the PDAL dev team may comment here. If so, then that is a bit of a worry regarding scalability ...

There are many algorithms that require all the points to be loaded because they care about point locality.  If you need to know the nearest neighbors of point X, and you have no idea where in the input dataset(s) the neighbor points are, you have to load all the points or read the data multiple times.  If you're doing heavy processing like this, it's not unreasonable to purchase hardware to support it.  You can get a server with 512GB of ram for less than $2500, I think.  Many people tile their data in order to work around algorithmic limitations when working with very large datasets.

If points are organized spatially, algorithms can take advantage to minimize memory usage and speed processing, but these algorithms are specialized for particular tasks and the arrangement of the data.  This is a lot of work ($), and when hardware is cheap, I'm not sure how valuable it is unless you're doing lots and lots of this kind of thing.  Also, each algorithm may have differing requirements on data arrangement for optimal handling.  LAStools triangulation, for example, makes assumptions about the data and does at least three passes of the input in order to reduce memory requirements.  It runs quickly, but it's very specific code for a particlar purpose.  Even then, the benefits decay when moving from 2D to 3D.

PDAL aims to be generic.  If you're interested in pursuing algorithm development that reduces memory requirements, it's something that we can work on with funding, but I'm not sure where it sits on our priority list otherwise.

--
Andrew Bell
andrew.bell.ia at gmail.com<mailto:andrew.bell.ia at gmail.com>



[cid:imageb48402.JPG at 9d85f537.4590f770] Please consider the environment before printing this email message.

________________________________

Ce courriel (incluant ses éventuelles pièces jointes) peut contenir des informations confidentielles et/ou protégées ou dont la diffusion est restreinte. Si vous avez reçu ce courriel par erreur, vous ne devez ni le copier, ni l'utiliser, ni en divulguer le contenu à quiconque. Merci d'en avertir immédiatement l'expéditeur et d'effacer ce courriel de votre système. Airbus DS Geo décline toute responsabilité en cas de corruption par virus, d'altération ou de falsification de ce courriel lors de sa transmission par voie électronique.

This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure. If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete this message and any attachments from your system. Airbus DS Geo disclaims any and all liability if this email transmission was virus corrupted, altered or falsified.

________________________________

Airbus DS Geo SA (325 089 589 RCS Toulouse) - Siege social: 5, rue des Satellites, 31400 Toulouse, France.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170619/86a886a5/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: imageb48402.JPG
Type: image/jpeg
Size: 625 bytes
Desc: imageb48402.JPG
URL: <http://lists.osgeo.org/pipermail/pdal/attachments/20170619/86a886a5/attachment.jpe>


More information about the pdal mailing list