<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><br id="lineBreakAtBeginningOfMessage"><div><br><blockquote type="cite"><div>On Apr 15, 2024, at 7:07 PM, Darrin Holst via pdal <pdal@lists.osgeo.org> wrote:</div><br class="Apple-interchange-newline"><div><div>Hello all,<br><br>Hope this isn’t too off topic, but was curious if anyone has any experience with pgpointcloud.<br><br>We’ve been consuming the usgs lidar data for a while now, but now we’re in need of creating and maintaining a modest point cloud…~18 billion points.<br><br>I’m looking for any wins, losses, pitfalls, or tips from managing a point cloud in postgis.<br></div></div></blockquote></div><br><div>Darrin,</div><div><br></div><div>Perfectly on topic...</div><div><br></div><div>My experience isn't definitive because I don't have a lot of practical experience with pgpointcloud. Instead we spent nearly five years building out a significant data management system using Oracle Point Cloud before abandoning it for the Entwine (and subsequently COPC) approaches you've seen spin out of this project. </div><div><br></div><div>Here's my experience and recommendation about storing point cloud data in postgis/postgresql through pgpointcloud:</div><div><br></div><div>Don't.</div><div><br></div><div>Here's why...</div><div><br></div><div>* Except in front loaded data production scenarios, users typically read point cloud data many more times than they write it. </div><div>* PostgreSQL brings transactions and ACID, but point clouds aren't financial data, and transactional lifecycles are extra overhead for point cloud data.</div><div>* Point cloud data are i/o-bound when you are trying to figure out what data you want and cpu-bound when you are trying to extract what you need from them. Index types that support the former work against the latter and vice versa. </div><div>* Putting the data in pg means all of your data must come through the database's often singular i/o stack. That's an annoying problem at terrabyte scale and an inconveniently expensive one at petabyte scale – especially to satisfy the simple query of "give me the points inside this box at this resolution". </div><div><br></div><div>But having access to my point clouds in SQL is super convenient. Yes indeed, but there are many ways nowadays to accomplish that. You could write a FDW to Entwine/COPC and manage the data much like you would a big raster pile. You could also write foreign functions that push down queries to a data source type of your choice. The key bit is because you're always reading, these functions can be idempotent and simple.</div><div><br></div><div>I would be very interested to hear the counter argument to my case from the pgpointcloud enthusiasts. What makes it worth the hassle?</div><div><br></div><div>Howard</div><div><br></div><div>A measure of the Hobu team's bonafides on the topic: The Hobu team created the USGS 3DEP Entwine bucket at <a href="https://registry.opendata.aws/usgs-lidar/">https://registry.opendata.aws/usgs-lidar/</a> With USGS' continued push of content to it, it is currently ~61 trillion points and nearly 320tb in size (see its footprints at <a href="https://usgs.entwine.io">https://usgs.entwine.io</a>). Not only is the data directly renderable in applications like Potree, QGIS, and our Cesium-based renderer called Eptium, it is possible to use it to do a two-stage raster-like query to gather content and do whatever you need with it. See my notebook at <a href="https://colab.research.google.com/drive/1JQpcVFFJYMrJCfodqP4Nc_B0_w6p5WOV#scrollTo=qiKI1JD9VqIr">https://colab.research.google.com/drive/1JQpcVFFJYMrJCfodqP4Nc_B0_w6p5WOV#scrollTo=qiKI1JD9VqIr</a> for an example. </div><div><br></div><div>For the USGS lidar data, it is totally possible to do something like this in PG, but why would you when it already exists and costs nothing to use?</div><div><br></div><div><br></div></body></html>