[postgis-users] Re: Massive Lidar Dataset Datatype Suggestion s?

Mon Nov 15 09:28:10 PST 2004

Just wanted to post this, to make sure that I am finding the right 
definition of netCDF. This is my first exposure to the term.

What Is netCDF?

NetCDF (network Common Data Form) is an interface for array-oriented data 
access and a freely-distributed collection of software libraries for C, 
Fortran, C++, Java, and perl that provide implementations of the interface. 
The netCDF software was developed by Glenn Davis, Russ Rew, Steve Emmerson, 
John Caron, and Harvey Davies at the Unidata Program Center in Boulder, 
Colorado, and augmented by contributions from other netCDF users. The 
netCDF libraries define a machine-independent format for representing 
scientific data. Together, the interface, libraries, and format support the 
creation, access, and sharing of scientific data.

NetCDF data is:
Self-Describing. A netCDF file includes information about the data it contains.
Architecture-independent. A netCDF file is represented in a form that can 
be accessed by computers with different ways of storing integers, 
characters, and floating-point numbers.
Direct-access. A small subset of a large dataset may be accessed 
efficiently, without first reading through all the preceding data.
Appendable. Data can be appended to a netCDF dataset along one dimension 
without copying the dataset or redefining its structure. The structure of a 
netCDF dataset can be changed, though this sometimes causes the dataset to 
be copied.
Sharable. One writer and multiple readers may simultaneously access the 
same netCDF file.

At 11:16 AM 11/15/2004 -0500, you wrote:

>Hi, all:
>
>
>
>I think I would be tarred and feathered if I didnt chime in.  We deal w/ 
>near real-time ocean observations, but it is the remote sensing and model 
>forecast data products that seem to align most closely w/ this thread.  I 
>have to admit that I skimmed the emails, and Im not sure if the data youre 
>attempting to store is to be used for visualization or data mining, but 
>maybe our application of PostGIS can apply to both.
>
>
>
>Lets take the model forecast product as an example.  We have at least two 
>hurdles to overcome:  our model data is about 5 days worth of water level, 
>currents, winds, air pressure for the SE US Atlantic.  It is refreshed 
>daily and is over 10 million points of data.  It all starts out as netCDF, 
>but leaving it in that form to produce images and animations and time 
>series would be killer.  Were entering y3 of our efforts, and I think weve 
>got a reasonable solution in place.
>
>
>
>Since our model data is hourly, I break the data into one table per hour 
>and index each table by space.  This works well from an interface point of 
>view since a user can only look at an hour snapshot of all our data.  Its 
>a little more of a problem to look at something like time series since 
>that requires me to JOIN the tables to produce a linearly aggregated 
>product.  But its still efficient.  At least much more efficient than my 
>first try at keeping all the data in one table, indexing it by space and 
>time, and then CLISTER-ing it.  That was a mess and didnt help much in the end.
>
>
>
>In addition to breaking out the products by time, I create different 
>aggregations of the data by granularity.  Keeping in mind that we are 
>primarily driven by producing snappy visualizations, I decided that Id 
>break up the data further into 5 different zoom levels, i.e. levels of 
>granularity.  If the user is looking at the entire SE US, there is no need 
>to have the source data finely granular simply hit the tables that contain 
>data appropriate for that extent.
>
>
>
>To recap:  We have to consider the fact that our data is updated near 
>real-time, so index creation has to be consistent and relatively 
>painless.  Ive established a round-robining scheme such that while machine 
>A is being updated, machine B (which has a copy of machine As data) 
>accepts DB queries.  Then when machine A is done, queries are returned to 
>As DB, and then Bs DB is updated.  The tables are created as hourly 
>snapshots of the data.  And then they are broken down into 5 zoom levels 
>(at least the remotely sensed data follows this pattern).
>
>
>
>To get a taste of what Im talking about, look here:
>
><http://nautilus.baruch.sc.edu/seacoos_misc/show_sea_coos_obs_time_ranges.php>http://nautilus.baruch.sc.edu/seacoos_misc/show_sea_coos_obs_time_ranges.php.
>
>
>
>The MODEL data layers (last two on that page) and the QuikSCAT wind 
>follows the pattern I describe above.
>
>
>
>Charlton
>
>
>
>
>
>
>
>
>
>Charlton Purvis
>
>(803) 777-8858 : voice
>
>(803) 777-3935 : fax
>
>cpurvis at sc.edu
>
>
>
>Baruch Institute
>
>University of South Carolina
>
>Columbia, SC 29208
>
>
>
>----------
>From: Yves Moisan [mailto:ymoisan at groupesm.com]
>Sent: Monday, November 15, 2004 8:43 AM
>To: postgis-users at postgis.refractions.net
>Subject: [postgis-users] Re: Massive Lidar Dataset Datatype Suggestions?
>
>
>
>  Hi,
>
>I am also still pondering how the heck I will be storing potentially large 
>amounts of water quality [point] data.  Integrating on space as Paul 
>suggests is interesting, but other integration schemes could be useful, 
>one being integration of the data "by object" (e.g. sensor, station ...).
>
>In the example I am thinking of, a bunch of point data could be boxed both 
>by time and sensor in the form of a single netCDF file (integration on 
>object=sensor) for an arbitrary time bin (e.g. a day, a week ...).
>
>I am still very hesitant as to what path is best.  Wouldn't a netCDF file 
>allow me to put all the relevant metadata as well that I could make sure 
>meets some standard (e.g. FDGC-CSDGM) instead of potentially having to put 
>that metadata in postgreSQL or an XML database ?  Would the spatial 
>querying machinery be efficient if the data were stored in netCDF files, 
>e.g. could I still use just the coordinates of my data points in postGIS 
>with a 3rd field being some sort of pointer to a BLOB in the form of a 
>netCDF file ?  I think if it is just for spatial queries, such a set up 
>would be fine.  But what if I wanted to further parametrize my queries by 
>some attribute data (e.g. give me all point measurements < 
>valueOrParameter=A > valueOrParameter=B) ?  I guess depending on volume 
>netCDF files could be opened from within postgreSQL without it being too 
>heavy an operation ?
>
>Your problem is one of sheer data volume and calls for some integration 
>mechanism, but I think one doesn't have to have a data volume problem to 
>realize that data integration is, in my opinion, a much more general 
>problem for all of us.
>
>Let us know what solution you chose.  I am too very much interested.
>
>Yves Moisan
>
>Gerry Creager N5JXS wrote:
>
>Hmmm... Can we start thinking in terms of a NetCDF data structure?
>_______________________________________________
>postgis-users mailing list
>postgis-users at postgis.refractions.net
>http://postgis.refractions.net/mailman/listinfo/postgis-users

Robert Burgholzer
Environmental Engineer
MapTech Inc.
phone: 804-869-3066
http://www.maptech-inc.com/ 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20041115/e862e169/attachment.html>