[pgpointcloud] Questions about indexing and querying

Sun Dec 15 12:27:03 PST 2013

This may or may not be related, but a few months ago when I was working
with pgpointcloud I noticed there was a slow way and a fast way to run some
queries.  I haven't had a chance to work with it lately, but if memory
serves me, which it may not, I remember there was a large difference with
large datasets (Twin Cities metro area lidar) when using the WITH syntax
for the select statement vs using more typical joins.

IIRC, WITH would end up building a temp table on disk and wait until all
rows were processed before returning the first record.  A single level
select would stream results back as they were calculated and didn't use
temp disk space.
On Dec 14, 2013 6:33 PM, "Howard Butler" <howard at hobu.co> wrote:

>
> On Dec 14, 2013, at 5:43 PM, Paul Ramsey <pramsey at cleverelephant.ca>
> wrote:
>
> >> So I have endeavored to process and load the entire State of Iowa's
> >> LiDAR data into pgpointcloud using PDAL. I currently have 1.24
> >> million patches at ~120k points per patch. I have some questions
> >> on how to use this holding more effectively.
> >>
> >> 1) Is there an equivalent to this when the pointcloud_postgis
> >> extension is enabled? Or did I miss something? The docs don't
> >> say too much about indexing other than using the chipper to create
> >> patches.
> >
> > Arg, yeah, there’s a big hole in indexing currently. Basically you have
> to enable pointcloud_postgis and then
> >
> > CREATE INDEX patches_idx on cloud using GIST (Geometry(patchcol));
> >
> > And since it’s a functional index, you then need to make sure than one
> of your terms is “Geometry(patches)” when trying to invoke it.
>
> Does this mean on-th-fly pc-to-geom conversion of every candidate? Is that
> expensive? Is it worth it to cache an explicit geometry for each patch/row
> in my table and simply interact with that spatially?
>
>
> >> 2) Does a simple count, ala "select sum(PC_NumPoints(pa)) from
> >> cloud", mean unpacking every patch all the way?
> >
> > No, the npoints is stored in the header, so it should only mean a peek
> into the front of the page.
>
> I haven't gotten an answer to my PC_NumPoints query a few hours later.
> Does fetching the header mean lighting up the entire object? I ask because
> I wonder if I should instead use more rows/patches to store the data
> instead of bigger, fuller patches. To use 400 point patches would mean
> ~390million patches/rows for this data. Do tables with that many rows in
> them work out very well? For massive point cloud data, the choice is either
> lots of rows or fat rows. Which way does PostgreSQL seem to favor?
>
> In Oracle land, we would want fat rows that because the cost of lighting
> up the blob is rather high. Once you've got it woken up, you might as well
> pull down a lot of points. Each row in Oracle has a roughly fixed cost, so
> the bigger the data in the rows, the better off you generally are.
>
> At CRREL, we also partition the table so that big chunks of the table
> (mapped to logical collections of data like a date of collection or
> something), can be dropped/updated altogether.  I'll explore what it would
> take to do that as well.
>
>
>
> _______________________________________________
> pgpointcloud mailing list
> pgpointcloud at lists.osgeo.org
> http://lists.osgeo.org/cgi-bin/mailman/listinfo/pgpointcloud
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/pgpointcloud/attachments/20131215/771df84f/attachment.html>