<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jun 16, 2017 at 7:19 AM, Albert Godfrind <span dir="ltr"><<a href="mailto:albert.godfrind@oracle.com" target="_blank">albert.godfrind@oracle.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><span class="">Ah OK. 6GB of LAZ would probably be equivalent to 75GB of LAS files. Still not a lot of data. Definitely something I would expect to process on a regular laptop without effort ...</span><div><br><div><span class=""><blockquote type="cite"><div class="m_7665107281840234668WordSection1"><div style="margin:0cm 0cm 0.0001pt;font-size:12pt;font-family:'Times New Roman',serif"><span lang="EN-US" style="font-size:11pt;font-family:Calibri,sans-serif;color:rgb(31,73,125)">I’ve witnessed RAM usage of more than 38 GB (intensive swap usage) on my own dev box (8 cores, 16 GB RAM, SSD), and the process could not finish because it ran out of memory. So extrapolating to the target machine, this is unbearable.</span></div></div></blockquote><div><br></div><div>Looks like one of the processes in the pipeline requires loading the entire (uncompressed) dataset in memory ? Someone in the PDAL dev team may comment here. If so, then that is a bit of a worry regarding scalability ...</div></span></div></div></div></blockquote><div><br></div><div>There are many algorithms that require all the points to be loaded because they care about point locality. If you need to know the nearest neighbors of point X, and you have no idea where in the input dataset(s) the neighbor points are, you have to load all the points or read the data multiple times. If you're doing heavy processing like this, it's not unreasonable to purchase hardware to support it. You can get a server with 512GB of ram for less than $2500, I think. Many people tile their data in order to work around algorithmic limitations when working with very large datasets.<br><br>If points are organized spatially, algorithms can take advantage to minimize memory usage and speed processing, but these algorithms are specialized for particular tasks and the arrangement of the data. This is a lot of work ($), and when hardware is cheap, I'm not sure how valuable it is unless you're doing lots and lots of this kind of thing. Also, each algorithm may have differing requirements on data arrangement for optimal handling. LAStools triangulation, for example, makes assumptions about the data and does at least three passes of the input in order to reduce memory requirements. It runs quickly, but it's very specific code for a particlar purpose. Even then, the benefits decay when moving from 2D to 3D.<br><br>PDAL aims to be generic. If you're interested in pursuing algorithm development that reduces memory requirements, it's something that we can work on with funding, but I'm not sure where it sits on our priority list otherwise.</div><div><br></div></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">Andrew Bell<br><a href="mailto:andrew.bell.ia@gmail.com" target="_blank">andrew.bell.ia@gmail.com</a></div>
</div></div>