[OSGeodata] IRC meeting coments

Jo Walsh jo at frot.org
Thu Apr 6 19:46:41 EDT 2006


dear Ned, thanks for this.

On Thu, Apr 06, 2006 at 05:49:31PM -0400, Ned Horning wrote:
> It's still not entirely clear to me what the initial objective is for
> hosting data but if it is simply to illustrate what can be done it doesn't
> make much difference what datasets are hosted at this point. The focus
> should probably be on getting something up and running and figuring out what
> capabilities are most useful and doable. 

Right, it's partly a "do it because we can" effort, because the hosting
facilities are available at telascience, John Graham is already
mirroring some large geodata sets there, and is interested in
expanding the collection that is being offered there. The telascience
group have had problems with incomplete / ad-hoc / absent metadata in
the past, and are keen to extract collective wisdom from OSGeo on this. 

The hope is also to provide a repository that works in support of the 
OSGeo  software; a 'showcase' for the capacities of the different
applications in the stack, that can be used for demo purposes. 

At the first Geodata meeting there were enough people interested in
working on data discovery and indexing issues to make plans for a
working group on geodata discovery:
http://wiki.osgeo.org/index.php/Geodata_Discovery_Working_Group
and a repository effort is a 'test site' for working out best
practises, or at least working practises, on that. Software projects
within OSGeo (particularly GRASS which is looking for candidate replacement
packages for the spearfish data set) also have needs for
quality-assured data packages for educational and demo purposes:
http://wiki.osgeo.org/index.php/Geodata_Packaging_Working_Group
(These groups were sketched out right at the start, the first meeting
two weeks ago, and are not set in stone, but represent interest
groupings that are contained here).

I know there are a lot similar, well-developed repository efforts in 
the world, and that as we don't have grant funding or similar, it might
make more sense to connect to, publicise and think about building
better interfaces to other peoples' offerings. But without a small
offering of our own, to focus and test ideas on, I don't think we'll 
have the motivation to do that. 

Ultimately, it's going to work best, or work at all, if this effort
fits well with the needs those who are contributing to this group or
to OSGeo as a whole. 

> For example, is it reasonable to
> allow users to select subsets of vector and raster data? What about letting
> the user select from an array of data formats? What kinds of search
> capabilities are most useful? Think from the perspective of a user who is
> not very tech savvy. Will our effort improve on the data browsing options
> that are out there?

Last week, mikel pointed out http://mapdex.org/ quite strenously. It
looks interesting, but also looks non-penetrable to non-experts. It
has its own custom API, some interesting visual aids but they don't
connect back into the human-writable searches. For me, this should be
much more than data browsing, but about being able to plug into
applications straight away. We have the advantage within OSGeo in that
this effort can be joined right up with the free software nearby -
both in terms of offering visual interfaces to data discovery through
clients like Mapbuilder, and offering joined-up configuration, or at
least really helpful walkthroughs, for getting data into Geoserver,
Mapserver or osgPlanet.

> For metadata I don't see much value in creating a new standard. One
> advantage of using FGDC or a subset of it is that most of the data that is
> freely distributed probably already has FGDC format metadata associated with
> it. What ever is decided for a metadata format it should at least be
> compatible with an existing system so it's easy to map fields between
> systems. I'm not sure if FGDC will be a barrier to new contributions. Do we
> expect to be generating large amounts of new metadata?  

I don't think anyone is proposing to create a new standard. But one
confusion I have with the FGDC standard is that it provides a standard
internal model, but not a standard transmission model with corresponding 
tools to produce and consume it. *Please* correct me if i'm wrong on
this, and help build a collective understanding of what one is really
getting from FGDC standards in terms of making metadata maximally
re-usable. I'm disturbed by the variations even in the short list at
http://www.fgdc.gov/metadata/geospatial-metadata-tools#availabletools

Personally, I come to this whole discussion from an RDF perspective,
and so my angle on this is probably more obtuse than most ;)
That gives me expectations that: 
- where there's already a well-known vocabulary (Dublin Core, FOAF etc) 
  that provides a good fit for metadata about a certain class of thing, it
  should be re-used
- plain-text descriptions of properties aren't as useful as lists of
  properties that are enumerated in a namespace
- data should be consumable without a need for a human to examine a
  file, write or run any custom parser and figure out what's going on

Also I and others in this group come from a cultural context in which
most state-collected geographic information is not in the public domain; 
in which there is a proprietary, "intellectual property"/copyright oriented 
stance towards geodata which is holding up the development of exchange
standards and data sharing practises. So there's very little sample
material to work from, and common practise in the US is a long way
ahead, just because the data is available to researchers and to
freelance experimenters. 

FGDC is an ideal goal to have in mind, but it's not necessarily a 
description of many peoples' needs from outside the US, nor of 
peoples' needs in terms of collaborative mapping projects
like openstreetmap, where asking for full FGDC compliance may not be
viable, and where there are qualities in the data that need to be in the
metadata (licensing terms available in human/machine readable form)
that FGDC doesn't cover. 

> I also wanted to let folks know about an initiative in the conservation
> community that is along the lines of what it seems we are trying to do. It's
> a Conservation GeoPortal
> (http://conserveonline.org/workspaces/cons.geo.portal) and they are mulling
> over some of the same issues we are. I have encouraged the lead on this
> project, Frank Biasi, to join this group as I expect he will have a lot to
> add. 

This would be great; I hope that the group here can provide a 'safe
space' for sharing implementation concerns amongst builders.
We heard recently about http://conservationcommons.org/ from
Gary Geller of JPL; are these connected efforts at all?

Thanks again for the questions, and my answers to them are far from
definitive, I'd be interested to hear others'...

cheers,


jo




More information about the Geodata mailing list