[postgis-users] Massive Lidar Dataset Datatype Suggestions?
collin
collin at socrates.berkeley.edu
Sat Nov 13 15:09:41 PST 2004
The server we will be using for production is a Sun FireV440 2 processor
1.7ghz UltraSPARCs smp, 16GB ram, 1 Terrabyte SCSI320 RAID system. My
test machine was just what I had availible to play with.
Some responses below.
Paul Ramsey wrote:
>
> On 13-Nov-04, at 12:34 PM, collin wrote:
>
>> I am trying to figure the best setup for storing, extracting and
>> processing this dataset. btw, it is a smallish dataset. We will be
>> processing 2 billion+ point projects in the near future.
>
>
> The key here is "the best setup for storing, extracting and processing".
> You are talking about non-trivial amounts of data and processing tasks,
> so the decisions you make about storage will have large downstream
> effects. The "right decisions" are dictated by what you are actually
> going to *do* with the data. How are you going to be querying it? What
> variables and variable combinations will you be using?
The primary use will be for internal processing, i.e. point
classification. I am hoping to do this by performing windowing
functions through the dataset from plpgsql. Queryies in this form are
mostly bounding boxes and I'll be mostly looking processing Z values and
intensity.
For the occaisonal extraction, we will be extracting by watershed
polygon. This system would not be live to the public nor is it going to
have multiple transactions occuring simultaneously (much).
I am uncertain what to do with the first and lost returns, since they
are both 3D points. Can you have two separate geometry columns in one
row?
> Do you really need to store every point as a separate row, for example?
> One "easy" way to cut down your storage and index size would be to
> store your LIDAR points as MULTIPOINT patches. Simply cut up your
> working plane with an arbitrary grid system and patch the points
> together based on their x/y values and what grid cell they fall into.
> Depending on the importance of the extra point attributes for your
> downstream processing plans, this simplification might be a very smart one.
This is an interesting idea, but elevation and intensity are the primary
information we use. Also which flight lines the points came from and
whether the point is first or last return. So I can't see multipoint
helping much, unless I'm misunderstanding you.
Would some form of physical indexing help? i.e. creating a new table
where the points are inserted by proximity? Or will the indexes work
sufficiently well to make this unnecessary? I ask this, because I am
not yet convinced placing the points into a database is necessarily the
right way to go.
> Really, the key is what your downstream processing regime will be.
> Regardless, get the LWGEOM working, HWGEOM is really inappropriate for
> point data.
> Paul
I agree completely. Getting LWGEOM working is my goal for the weekend :-)
________________
Collin Bode
GIS Informatics Researcher
Power Lab, Integrative Biology
University of California, Berkeley
More information about the postgis-users
mailing list