<p dir="ltr">This may or may not be related, but a few months ago when I was working with pgpointcloud I noticed there was a slow way and a fast way to run some queries.  I haven't had a chance to work with it lately, but if memory serves me, which it may not, I remember there was a large difference with large datasets (Twin Cities metro area lidar) when using the WITH syntax for the select statement vs using more typical joins.</p>


<p dir="ltr">IIRC, WITH would end up building a temp table on disk and wait until all rows were processed before returning the first record.  A single level select would stream results back as they were calculated and didn't use temp disk space.</p>


<div class="gmail_quote">On Dec 14, 2013 6:33 PM, "Howard Butler" <<a href="mailto:howard@hobu.co">howard@hobu.co</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

On Dec 14, 2013, at 5:43 PM, Paul Ramsey <<a href="mailto:pramsey@cleverelephant.ca">pramsey@cleverelephant.ca</a>> wrote:<br>

<br>

>> So I have endeavored to process and load the entire State of Iowa's<br>

>> LiDAR data into pgpointcloud using PDAL. I currently have 1.24<br>

>> million patches at ~120k points per patch. I have some questions<br>

>> on how to use this holding more effectively.<br>

>><br>

>> 1) Is there an equivalent to this when the pointcloud_postgis<br>

>> extension is enabled? Or did I miss something? The docs don't<br>

>> say too much about indexing other than using the chipper to create<br>

>> patches.<br>

><br>

> Arg, yeah, there’s a big hole in indexing currently. Basically you have to enable pointcloud_postgis and then<br>

><br>

> CREATE INDEX patches_idx on cloud using GIST (Geometry(patchcol));<br>

><br>

> And since it’s a functional index, you then need to make sure than one of your terms is “Geometry(patches)” when trying to invoke it.<br>

<br>

Does this mean on-th-fly pc-to-geom conversion of every candidate? Is that expensive? Is it worth it to cache an explicit geometry for each patch/row in my table and simply interact with that spatially?<br>

<br>

<br>

>> 2) Does a simple count, ala "select sum(PC_NumPoints(pa)) from<br>

>> cloud", mean unpacking every patch all the way?<br>

><br>

> No, the npoints is stored in the header, so it should only mean a peek into the front of the page.<br>

<br>

I haven't gotten an answer to my PC_NumPoints query a few hours later. Does fetching the header mean lighting up the entire object? I ask because I wonder if I should instead use more rows/patches to store the data instead of bigger, fuller patches. To use 400 point patches would mean ~390million patches/rows for this data. Do tables with that many rows in them work out very well? For massive point cloud data, the choice is either lots of rows or fat rows. Which way does PostgreSQL seem to favor?<br>


<br>

In Oracle land, we would want fat rows that because the cost of lighting up the blob is rather high. Once you've got it woken up, you might as well pull down a lot of points. Each row in Oracle has a roughly fixed cost, so the bigger the data in the rows, the better off you generally are.<br>


<br>

At CRREL, we also partition the table so that big chunks of the table (mapped to logical collections of data like a date of collection or something), can be dropped/updated altogether.  I'll explore what it would take to do that as well.<br>


<br>

<br>

<br>_______________________________________________<br>

pgpointcloud mailing list<br>

<a href="mailto:pgpointcloud@lists.osgeo.org">pgpointcloud@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud" target="_blank">http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud</a><br></blockquote></div>