<div dir="ltr">Thanks Howard. That was helpful. I'm also looking forward to any counters.<div><br></div><div>The process in your notebook is exactly what we're doing to get a DEM and compute elevation derivatives (wetness index, slope, etc.) on a parcel by parcel basis. What we're needing now is to precompute those derivatives for a much larger area. What I haven't found yet is the getting started guide for building and maintaining that point cloud...kind of like the 3DEP pipeline on a smaller scale. (disclaimer...I'm not the one on the team tasked with building this pipeline so I haven't looked really hard so apologies if that information is out there somewhere)</div><div><br></div><div>Darrin</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Apr 16, 2024 at 7:41 AM Howard Butler <<a href="mailto:howard@hobu.co">howard@hobu.co</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div><br id="m_5796865444398017072lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On Apr 15, 2024, at 7:07 PM, Darrin Holst via pdal <<a href="mailto:pdal@lists.osgeo.org" target="_blank">pdal@lists.osgeo.org</a>> wrote:</div><br><div><div>Hello all,<br><br>Hope this isn’t too off topic, but was curious if anyone has any experience with pgpointcloud.<br><br>We’ve been consuming the usgs lidar data for a while now, but now we’re in need of creating and maintaining a modest point cloud…~18 billion points.<br><br>I’m looking for any wins, losses, pitfalls, or tips from managing a point cloud in postgis.<br></div></div></blockquote></div><br><div>Darrin,</div><div><br></div><div>Perfectly on topic...</div><div><br></div><div>My experience isn't definitive because I don't have a lot of practical experience with pgpointcloud. Instead we spent nearly five years building out a significant data management system using Oracle Point Cloud before abandoning it for the Entwine (and subsequently COPC) approaches you've seen spin out of this project. </div><div><br></div><div>Here's my experience and recommendation about storing point cloud data in postgis/postgresql through pgpointcloud:</div><div><br></div><div>Don't.</div><div><br></div><div>Here's why...</div><div><br></div><div>* Except in front loaded data production scenarios, users typically read point cloud data many more times than they write it. </div><div>* PostgreSQL brings transactions and ACID, but point clouds aren't financial data, and transactional lifecycles are extra overhead for point cloud data.</div><div>* Point cloud data are i/o-bound when you are trying to figure out what data you want and cpu-bound when you are trying to extract what you need from them. Index types that support the former work against the latter and vice versa. </div><div>* Putting the data in pg means all of your data must come through the database's often singular i/o stack. That's an annoying problem at terrabyte scale and an inconveniently expensive one at petabyte scale – especially to satisfy the simple query of "give me the points inside this box at this resolution". </div><div><br></div><div>But having access to my point clouds in SQL is super convenient. Yes indeed, but there are many ways nowadays to accomplish that. You could write a FDW to Entwine/COPC and manage the data much like you would a big raster pile. You could also write foreign functions that push down queries to a data source type of your choice. The key bit is because you're always reading, these functions can be idempotent and simple.</div><div><br></div><div>I would be very interested to hear the counter argument to my case from the pgpointcloud enthusiasts. What makes it worth the hassle?</div><div><br></div><div>Howard</div><div><br></div><div>A measure of the Hobu team's bonafides on the topic: The Hobu team created the USGS 3DEP Entwine bucket at <a href="https://registry.opendata.aws/usgs-lidar/" target="_blank">https://registry.opendata.aws/usgs-lidar/</a>  With USGS' continued push of content to it, it is currently ~61 trillion points and nearly 320tb in size (see its footprints at <a href="https://usgs.entwine.io" target="_blank">https://usgs.entwine.io</a>). Not only is the data directly renderable in applications like Potree, QGIS, and our Cesium-based renderer called Eptium, it is possible to use it to do a two-stage raster-like query to gather content and do whatever you need with it. See my notebook at <a href="https://colab.research.google.com/drive/1JQpcVFFJYMrJCfodqP4Nc_B0_w6p5WOV#scrollTo=qiKI1JD9VqIr" target="_blank">https://colab.research.google.com/drive/1JQpcVFFJYMrJCfodqP4Nc_B0_w6p5WOV#scrollTo=qiKI1JD9VqIr</a> for an example. </div><div><br></div><div>For the USGS lidar data, it is totally possible to do something like this in PG, but why would you when it already exists and costs nothing to use?</div><div><br></div><div><br></div></div></blockquote></div>