[OSGeodata] Geodata Metadata Requirements

Jo Walsh jo at frot.org
Wed Apr 12 11:45:34 EDT 2006


dear Ned, thanks for this,
On Tue, Apr 11, 2006 at 08:56:57AM -0400, Ned Horning wrote:
> Although there are advantages to adding more structure I wouldn't drop all
> of the full-text fields. I'm concerned that we would loose essential
> description information without some full-text. For example, "description"
> information is probably best left as full-text. It's fine to add structured
> fields for important "description" components but I'd hesitate dropping the
> full-text field. I may be in the minority but I actually read some of these
> text fields. 

:)
What Dublin Core prescribes to go in a 'description' field is
generous: "Description may include but is not limited to: an abstract,
table of contents, reference to a graphical representation of content
or a free-text account of the content."

For a more 'structured' description of what a data set contains, I'd
want to find either references to standard taxonomies that correspond
to the 'Theme Keywords' that FGDC asks for, or allow free-text 'tags'
(basically just keywords) that could be mapped to a taxonomy if that
was useful.  

> Why are there different "datasource" fields for TorrentFile, WMS,
> shapefile.? It seems like a mix of data models, data compression, data
> delivery, file formats. Why not just one "datasource" field. Maybe I don't
> understand what a "datasource" is. I expect the answer is obvious but I just
> don't see it.

I look at the 'datasource' as a property that can be repeated multiple
times, for as many data sources are available. (For example, there was some
consensus here that providing a bittorrent download facility for geodata where 
possible, through geotorrent.org where that fit with their constraints
on format would always be a good thing.) Providing WFS/WMS interfaces
to data is also a strong goal.

The reason I thought to define several 'classes' (object types, say) 
of datasource is out of a desire to automate download/discovery based
on what type of datasource one is presented with; for a WFS datasource
object, one might want to attache FeatureType properties to it,
automatically extract bounding box information for extents in the
metadata; or for bittorrent, trigger a client download. 

A motivating/illustrating example here is Edd Dumbill's DOAP schema
for describing open source software projects. It defines one
'repository' property which can point to several different classes of
repository - cvs, svn, bitkeeper etc. cf
http://www-128.ibm.com/developerworks/xml/library/x-osproj3/ 

> What's the difference between "projection" and 'spatial reference"? Isn't a
> projection part of a spatial reference?

Ah, this is a misparse on my part - I was looking at the FGDC
description of 'Spatial Data Organization' (which is Point, Vector or
Raster) and substituting 'Spatial Reference' in my mind. 
cf http://biology.usgs.gov/fgdc.metadata/version2/sdorg.htm
I updated the wiki page to reflect this. 

> Why do we need a different spatial reference for raster and vector models.
> What about other data models?

What other kinds of data models do you have in mind? Something that a
data repository would be likely to run into pretty soon?
Is this question less relevant since my admitted misparse above?

> It would probably be good to have information about ownership and how/where
> it can be accessed. 

I think the latter is encompassed in 'datasource'. The former, there's
a property/field in FGDC for 'originator' which I included in this
model; a separate one for 'point of contact' which isn't included yet,
but probably should be. 

> If quality assurance is the driving force behind creating the metadata it
> would be useful to add some fields related to quality.  

The question of "quality" and "completeness" is one I don't really have 
enough context for. QA is definitely a concern, and so this effort
should attempt to define things that would make that easier. But how
to describe a metric for it? e.g. if a STRM / another DEM has holes in
it, or aerial imagery has a lot of cloud cover, there could be a
quality 'percentage' which indicates how much of the data set is
clear. But for vector data I don't find it easy to imagine how this
works - how would one assess the quality of VMap0? How would one
assess the completeness of a data set like openstreetmap, which
changes every day, and could *potentially* (like the Ordnance Survey's
fabled MasterMap) one day include a vector description of every static
object in an area over one metre in size... ;)

> I hope this is useful.

Definitely; thanks very much.


jo




More information about the Geodata mailing list