[Mapserver-users] Sensor Web Enablement (SWE), YSI and MapServer

Yves Moisan ymoisan at groupesm.com
Thu Jul 22 10:42:36 PDT 2004


Hi Gerry,

> I'll try not to sound too pedantic, but it may be a little difficult in
> a couple of places...  In fact, I'll get that out of the way first.

I don't care about pedantic people so long as I can see the attitude is a
manifestation of being passionately  involved  into one's subject matter
rather than an indication of personality.  Good/strong arguments may make
people appear pedantic but I don't take those things personal ;).  So please
don't hesitate in correcting/criticizing/commenting.

> <rant>Let's consider that 'Metadata' are data describing the data.  Thus
> we have a method of describing how data are collected: methods,
> instruments, calibration specifics, units, sensitivities, etc.  Your
> catalog of such, below, is a good start.
>
> What often gets missed in the early generation of metadata is a
> consistent method of data characterization... standardization of
> collection, characterization, storage, units, necessary elements, etc.
> The standards piece needs to be considered as either a precursor or an
> equal partner to the creation of metadata and metadata descriptors.
> </rant>
>
> OK.  That's out of my system.  Metadata's a pain to populate, but vital.
>   It's also sexier than creating and adhering to data standards, hence
> my frustration...

Maybe it's a mixture of both the fact that I'm French speaking and the fact
that I am new to this field (never even seen a sonde before!), but I only
get a sense of your rant.  Let me work it out.  Metadata definition : data
about/describing the measured data.  Let's call this metaData.  The "object"
data represent quantites on the thematic object of interest, e.g. measured
value for dissolved organic carbon, and this relates to the data collection
event and to the data per se.  Let's call this objectData.

For objectData to be displayed on a map, we can make do without metaData.
The worst that can happen is that non valid data (e.g. due to saturation) is
presented on the map.  What I am asked to provide to my "power users" is all
they need to know about the objectData so that they can use it within a
sound scientific context.  I sort of chopped off the metaData tag into
subtags that allow increasing "distance" into the stack of information.  Let
me explain.

Closest in the stach to the measured objectData is metaData 'about' the
data.  That is, anything that is directly or immediately linked to the
measured objectData, e.g. : units (although we could debate about making
this a full-fledged tag by itself instead of just an attribute); location;
date/time; instrument type; calibration data; etc.  The hard part I guess is
coming up with an exhaustive enough (yet not crazy!) list of elements that
qualify as "AboutMetaData".  This is what I am interested in finding a way
of stuffing in the database in a consistent "standard" way.

Further away from the measurement in the stack, we want information not
about the actual measured value, but about the object per se, that is
"describing" the data e.g. : definition of the parameter we are measuring
(which is probably what you call "characterization"); role of this parameter
in assessing problem domains (e.g. what about DOC in water quality
assessment).  That would be the role of a "parent service" that would hold
all the basic definitions and relations between parameters measured by all
types of instruments (I could see manufacturers committing sensor specific
peculiarities in such a system, yet to be defined) and potentially that
service could provide some expertise in interpreting the measured value.  We
are not there yet.

A point about "standardization of collection".  I think the set of
instruments defines de facto standards.  With the technology evolving, there
will be new sensors using new methods all the time.  I don't think it is
feasible to make a formal recommendation on a standard of collection.  Just
keeping the information as to what instrument was used and what data
acquisition method was used would be enough to qualify the objectData.  If
we are thinking of a group of citizens who want to test out their ph-meters
in the river nearby, we could allow those "data providers" to shove their
data in the DB but we will need to qualify the data somehow that it is clear
for a user what that data is and how it can be used, if at all, in any
subsequent quantitative analysis.

> Yellow Springs Instruments?  The thermocouple folks?  I'd be very
> surprised if the DAT format isn't explained or reverse-engineerable with
> a little bit of work...

About YSI binary format : I tried some reverse engineering, but maybe with
the wrong tools.  I am just trying to read in the data structure as ints,
floats, chars just to find out some pattern using a small Python script.  If
you know of a piece of software to do that, that would be great.

> The key is having a set of (standardized) metadata available that allows
> you to determine the data parsing, precision and relative rank
> importance for storage.  A lot of instruments send a lot of data, some
> of which may or may not be necessary, and may actually detract from your
> work by adding confusion.

Right : once I know the structure of the data stream, I can make a relation
between the byte sections in the file and a set of metadata and extract the
data I want.

> The approach we use is to acquire the data, stuff it immediately into a
> database (PostGres/PostGIS) then perform operations on the data.
>Once  it hits the database/archive, its manipulations are not re-stored.
We
> operate on the original data for any transformations, and then report it
> that way.  Thus, we have the original data to fall back on, and a
> separate archive of the metadata associated with it.

So you take the binary data right into PostGIS as a BLOB ? Then upon
retrieval by a spatial query you apply a transformation on the binary data
to extract whatever parameters are asked by the client ?  I thought of
storing the original data, then store a "serialized" form of it in PostGIS
as a preprocessing step in a format such that it is amenable to MapServer.
I would keep dynamic processing of the binary file for parameters that are
not stored by default in the serialized copy of the binary file instead of
systematically launching some processing script ??

> I see no reason why not.  The idea, by the way, of a sonde simulator is
> a good one.

Once I get the data in a Python structure, I can just create new instances
with modified values, write out another DAT file and that way simulate a
data acquisition event.  But I need that friggin data structure, unless I
just want to set up a cron with the same .DAT file ...

Thanx again for your time.  I'll dig in deeper into this in a few weeks
time.  More questions then!

Yves




More information about the MapServer-users mailing list