[postgis-users] Massive Lidar Dataset Datatype Suggestions?

Sat Nov 13 15:09:41 PST 2004

The server we will be using for production is a Sun FireV440 2 processor 
1.7ghz UltraSPARCs smp, 16GB ram, 1 Terrabyte SCSI320 RAID system.  My 
test machine was just what I had availible to play with.

Some responses below.

Paul Ramsey wrote:
> 
> On 13-Nov-04, at 12:34 PM, collin wrote:
> 
>> I am trying to figure the best setup for storing, extracting and 
>> processing this dataset.  btw, it is a smallish dataset. We will be 
>> processing 2 billion+ point projects in the near future.
> 
> 
> The key here is "the best setup for storing, extracting and processing". 
> You are talking about non-trivial amounts of data and processing tasks, 
> so the decisions you make about storage will have large downstream 
> effects. The "right decisions" are dictated by what you are actually 
> going to *do* with the data.  How are you going to be querying it?  What 
> variables and variable combinations will you be using?

The primary use will be for internal processing, i.e. point 
classification.  I am hoping to do this by performing windowing 
functions through the dataset from plpgsql.  Queryies in this form are 
mostly bounding boxes and I'll be mostly looking processing Z values and 
intensity.

For the occaisonal extraction, we will be extracting by watershed 
polygon.  This system would not be live to the public nor is it going to 
  have multiple transactions occuring simultaneously (much).

I am uncertain what to do with the first and lost returns, since they 
are both 3D points.  Can you have two separate geometry columns in one 
row?

> Do you really need to store every point as a separate row, for example? 
>  One "easy" way to cut down your storage and index size would be to 
> store your LIDAR points as MULTIPOINT patches. Simply cut up your 
> working plane with an arbitrary grid system and patch the points 
> together based on their x/y values and what grid cell they fall into.  
> Depending on the importance of the extra point attributes for your 
> downstream processing plans, this simplification might be a very smart one.

This is an interesting idea, but elevation and intensity are the primary 
information we use.  Also which flight lines the points came from and 
whether the point is first or last return.  So I can't see multipoint 
helping much, unless I'm misunderstanding you.

Would some form of physical indexing help? i.e. creating a new table 
where the points are inserted by proximity?  Or will the indexes work 
sufficiently well to make this unnecessary?  I ask this, because I am 
not yet convinced placing the points into a database is necessarily the 
right way to go.

> Really, the key is what your downstream processing regime will be. 
> Regardless, get the LWGEOM working, HWGEOM is really inappropriate for 
> point data.
> Paul

I agree completely.  Getting LWGEOM working is my goal for the weekend :-)

________________
Collin Bode
GIS Informatics Researcher
Power Lab, Integrative Biology
University of California, Berkeley