[OSGeodata] Webcrawlers perspective or: How to boost your geodata?

Jo Walsh jo at frot.org
Tue Oct 3 03:49:27 EDT 2006


dear Stefan, thanks for this,

On Mon, Oct 02, 2006 at 02:43:45PM +0200, Stefan F. Keller wrote:
> Just collected some thoughts about "Webcrawlers perspective or: How to boost
> your geodata?" at
> http://wiki.osgeo.org/index.php/Geodata_Metadata_Requirements#Discovery.

My mind is sleeplessly abuzz with interconnected Stuff as I look at
all this from different angles and try to catch up on the discussion.
So here is a rambling braindump about what I am picking up. 

Where we found "two sides, same coin" there are at least two distinct 
models. One describes geodata; another describes OGC web services 
interfaces through which one can access that geodata. The second is
essentially expressed by W*S GetCapabilities and the W*S specs.
(We can get into services that query via or for styles, or that pass
contexts around and try to recommend layers/features, later...)

OWS is a disturbingly small portion of the geodata ecology (and at
least one strand of the "how much we need a simple, distributed
geodata 'catalog'" conversation is an attempt to rectify that by
demonstrating the usability/reuse value inherent in OWS?)

What about indexing KML, GeoRSS, these other new age geography forms
of data? Paul and Stefan both have talked about the lack of innate 
'linkedness' of spatial data sets, and how the crawler-based search
model that works in the wild so well doesn't work over web services.
RDF has a dumb fix for this, whereby one can attach to the subject of
any statement a predicate-object pair saying, in relation to this 
"rdfs:seeAlso [whatever graph is contained in this URL over here]"

Features don't have much use for that facility. But they do have an
amazing, 3 dimensional capacity of linkedness - the spatial one -
where the containment relationships are literal, physical. Features
relate through things recorded in places people notice. The
neogeoblogosphere is filling up with these things. 

If you had to write a 5-10 year plan for where geometadata search ->
data reuse is going, just on a technological basis, would OWS be in it?
I would not like to design an architecture now that assumed so. 
I would like to stay transmission-neutral and view OWS as just another
source and sink for data among many. 

So here are a lot of people with different perspectives on the same
place. I want to be able to build a distributed data library that is
really alive, that is set up to do a lot of dynamic data repackaging 
for local and offline use, that builds up a lot of tasty
machine-intelligible information as a byproduct of human semantic
entropy. Others are hyperfocused on cracking the OWS problem and
opening up the valve on the amazing amounts of public geodata that
*is* getting out there just not being openly reused or made easy to
get at... and on finding ways to build better userfriendly or at least
userapproachable map-oriented data search interfaces... or on
building intelligent client apps to stresstest what is out there and
honestly do all the hoopjumping in the standards orientation. Yet
somehow we are all here thinking we are talking about the same thing.

I would like to know about the history of GetCapabilities a bit and
how that came to be carrying so much metadata which is disconnected
from the process of what is being published - the "install geoserver,
fill in semi-random data when prompted, neglect for N years" cycle.

The first point at which I went "eek" when looking at geometadata
standards was seeing in FGDC, metadata properties for the creator of
the metadata set. (I am assuming ISO19915 looks the same way). To me
this seems upside down. It's no use for client *or* server interfaces.
A model that asks you to describe your descriptions of data, that
starts with the description and not with what happens in the data, 
makes no sense to me. More "no good for humans, no good for machines". 
Also wedded to the one-authoritative-source stance on
state/corporate-collected data - and we know this is passing, right?

Data packaging is something I want to think more about; this problem
space is not so much about data and data-about-data transmission, it's
also about the exchange of rules, algorithms along with data (think,
for a moment, GeoDRM) and about small-footprint easy-install software
packages that are as inseparable from data. This looks inevitable and
it does not look like OWS as they are currently shaped.

Some people claim we are living in a digital dark age. Some people
claim we are living in a digital golden age. I know that I want to
have the data I care about and that might help debug the world around
me as physically close as possible and also as widely distributed as
possible. On this sombre, postmillenarian note I really should stop
typing. If you made it this far, thanks for listening to me get some
of the junk out of my head in no particular order.

cheers,


jo   




More information about the Geodata mailing list