[pgpointcloud] Questions about indexing and querying

Sat Dec 14 16:33:23 PST 2013

On Dec 14, 2013, at 5:43 PM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:

>> So I have endeavored to process and load the entire State of Iowa's 
>> LiDAR data into pgpointcloud using PDAL. I currently have 1.24  
>> million patches at ~120k points per patch. I have some questions  
>> on how to use this holding more effectively.
>> 
>> 1) Is there an equivalent to this when the pointcloud_postgis  
>> extension is enabled? Or did I miss something? The docs don't  
>> say too much about indexing other than using the chipper to create  
>> patches.
> 
> Arg, yeah, there’s a big hole in indexing currently. Basically you have to enable pointcloud_postgis and then
> 
> CREATE INDEX patches_idx on cloud using GIST (Geometry(patchcol));
> 
> And since it’s a functional index, you then need to make sure than one of your terms is “Geometry(patches)” when trying to invoke it.

Does this mean on-th-fly pc-to-geom conversion of every candidate? Is that expensive? Is it worth it to cache an explicit geometry for each patch/row in my table and simply interact with that spatially?

>> 2) Does a simple count, ala "select sum(PC_NumPoints(pa)) from  
>> cloud", mean unpacking every patch all the way?
> 
> No, the npoints is stored in the header, so it should only mean a peek into the front of the page. 

I haven't gotten an answer to my PC_NumPoints query a few hours later. Does fetching the header mean lighting up the entire object? I ask because I wonder if I should instead use more rows/patches to store the data instead of bigger, fuller patches. To use 400 point patches would mean ~390million patches/rows for this data. Do tables with that many rows in them work out very well? For massive point cloud data, the choice is either lots of rows or fat rows. Which way does PostgreSQL seem to favor? 

In Oracle land, we would want fat rows that because the cost of lighting up the blob is rather high. Once you've got it woken up, you might as well pull down a lot of points. Each row in Oracle has a roughly fixed cost, so the bigger the data in the rows, the better off you generally are. 

At CRREL, we also partition the table so that big chunks of the table (mapped to logical collections of data like a date of collection or something), can be dropped/updated altogether.  I'll explore what it would take to do that as well.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 495 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20131214/654e423c/attachment.pgp>