[postgis-users] Re: Massive Lidar Dataset Datatype Suggestion s?

Purvis, Charlton cpurvis at asg.sc.edu
Mon Nov 15 08:16:55 PST 2004


Hi, all:

 

I think I would be tarred and feathered if I didn't chime in.  We deal w/
near real-time ocean observations, but it is the remote sensing and model
forecast data products that seem to align most closely w/ this thread.  I
have to admit that I skimmed the emails, and I'm not sure if the data you're
attempting to store is to be used for visualization or data mining, but
maybe our application of PostGIS can apply to both.

 

Let's take the model forecast product as an example.  We have at least two
hurdles to overcome:  our model data is about 5 days worth of water level,
currents, winds, air pressure for the SE US Atlantic.  It is refreshed daily
and is over 10 million points of data.  It all starts out as netCDF, but
leaving it in that form to produce images and animations and time series
would be killer.  We're entering y3 of our efforts, and I think we've got a
reasonable solution in place.

 

Since our model data is hourly, I break the data into one table per hour and
index each table by space.  This works well from an interface point of view
since a user can only look at an hour snapshot of all our data.  It's a
little more of a problem to look at something like time series since that
requires me to JOIN the tables to produce a linearly aggregated product.
But it's still efficient.  At least much more efficient than my first try at
keeping all the data in one table, indexing it by space and time, and then
CLISTER-ing it.  That was a mess and didn't help much in the end.

 

In addition to breaking out the products by time, I create different
aggregations of the data by granularity.  Keeping in mind that we are
primarily driven by producing snappy visualizations, I decided that I'd
break up the data further into 5 different zoom levels, i.e. levels of
granularity.  If the user is looking at the entire SE US, there is no need
to have the source data finely granular - simply hit the tables that contain
data appropriate for that extent.

 

To recap:  We have to consider the fact that our data is updated near
real-time, so index creation has to be consistent and relatively painless.
I've established a round-robining scheme such that while machine A is being
updated, machine B (which has a copy of machine A's data) accepts DB
queries.  Then when machine A is done, queries are returned to A's DB, and
then B's DB is updated.  The tables are created as hourly snapshots of the
data.  And then they are broken down into 5 zoom levels (at least the
remotely sensed data follows this pattern).

 

To get a taste of what I'm talking about, look here:

http://nautilus.baruch.sc.edu/seacoos_misc/show_sea_coos_obs_time_ranges.php
<http://nautilus.baruch.sc.edu/seacoos_misc/show_sea_coos_obs_time_ranges.ph
p> .

 

The MODEL data layers (last two on that page) and the QuikSCAT wind follows
the pattern I describe above.

 

Charlton

 

 

 

 

Charlton Purvis

(803) 777-8858 : voice

(803) 777-3935 : fax

cpurvis at sc.edu

 

Baruch Institute

University of South Carolina

Columbia, SC 29208

 

  _____  

From: Yves Moisan [mailto:ymoisan at groupesm.com] 
Sent: Monday, November 15, 2004 8:43 AM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Re: Massive Lidar Dataset Datatype Suggestions?

 

 Hi, 

I am also still pondering how the heck I will be storing potentially large
amounts of water quality [point] data.  Integrating on space as Paul
suggests is interesting, but other integration schemes could be useful, one
being integration of the data "by object" (e.g. sensor, station ...). 

In the example I am thinking of, a bunch of point data could be boxed both
by time and sensor in the form of a single netCDF file (integration on
object=sensor) for an arbitrary time bin (e.g. a day, a week ...). 

I am still very hesitant as to what path is best.  Wouldn't a netCDF file
allow me to put all the relevant metadata as well that I could make sure
meets some standard (e.g. FDGC-CSDGM) instead of potentially having to put
that metadata in postgreSQL or an XML database ?  Would the spatial querying
machinery be efficient if the data were stored in netCDF files, e.g. could I
still use just the coordinates of my data points in postGIS with a 3rd field
being some sort of pointer to a BLOB in the form of a netCDF file ?  I think
if it is just for spatial queries, such a set up would be fine.  But what if
I wanted to further parametrize my queries by some attribute data (e.g. give
me all point measurements < valueOrParameter=A > valueOrParameter=B) ?  I
guess depending on volume netCDF files could be opened from within
postgreSQL without it being too heavy an operation ? 

Your problem is one of sheer data volume and calls for some integration
mechanism, but I think one doesn't have to have a data volume problem to
realize that data integration is, in my opinion, a much more general problem
for all of us. 

Let us know what solution you chose.  I am too very much interested. 

Yves Moisan 

Gerry Creager N5JXS wrote: 



Hmmm... Can we start thinking in terms of a NetCDF data structure? 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20041115/7016b7e2/attachment.html>


More information about the postgis-users mailing list