[okfn-discuss] Fw: [Geodata] discoverability and the wiki

Aaron Straup Cope asc at spum.org
Sat Oct 6 12:18:27 EDT 2007

One of the principal motivations behind using SMW for the wine site 
(grape.spum.org) was laziness.

At the time (I was washing the dishes) I briefly considered it as an 
opportunity to play with Rails and/or Django and then quickly decided I 
couldn't be bothered with setting up databases and managing 
dependencies; both of which quickly descend in to the tedium of managing 
relationships and input validation. Or :


The decision was also influenced by an ongoing struggle about how to 
bridge the gap (read : chasm) between people and machines for "storing" 
recipes; a problem that is fantastically harder than it seems. Or :


When I finally started poking around how to do stuff in MW the one thing 
that stunned me was, in fact, how complicated many of the articles were.

Like anything else, it had developed its own language of specialization 
in the same way that people have adapted their practice (and 
expectation) for things like tagging in delicious. Or any wiki, for that 
matter. Or :


Whether or not a registry of geodata lends itself to that kind of 
practice remains, of course, an open question.

At this point, it is probably also worth pointing out that I am 
intimately involved with the "machine tags" work at Flickr so if my 
biases aren't already clear let there be no doubt :-)


The thing about machine tags is that they are RDF by any measure. The 
key difference being : You don't worry about namespaces unless you want 
to. In a controlled environment, like the SMW, though you can simply set 
up a registry of known prefixes and let the computrons sort it out.

So, perhaps one approach would be to simply update the CKAN to (I am 
happy to submit patches once I've looked at the code and my mother isn't 
visiting... ;-) store machine tags to allow for chunks of arbitrary 
domain-specific metadata, per Rufus' comment.

This is, in fact, really easy until you get to the search part. Or :


And there's the rub. The search part -- not only finding, but finding 
relevant answers -- is always going to be the hard part because implicit 
in the "problem statement" (or solution) is that someone has managed to 
write the Do What I Mean engine.

The RDF weirdos like to believe that TBL's magic layer cake of 
trust+proof+logic is the answer, which is madness. The Google people 
like to believe that their special "We're smarter than you" sauce is the 
answer, which is hubris. Social networking sites like to believe that 
your contacts have the answer, which is wishful thinking.

Meanwhile the CPAN is probably the only tool that has ever managed to 
gracefully ("gracefully") dance around the problem; although often at 
the expense of needing to install half the Internet just to add support 
for plain text sprockets...

Which is a very long way of saying : I don't think that there's really a 
need to worry about "random" yet.

It will be messy, for sure, but I tend to think it is more important to 
let people add data quickly and easily than it is to try to imagine how 
it will sort itself out in the end.

The sorting out is important but that's always going to be subject to 
both the magic (computers) and conventions (humans) of the day.

By which I mean to say : horse=yes!

Rufus Pollock wrote:
> Jo Walsh wrote:
>> dear all,
> This is really interesting Jo (and Aaron). At this point I feel honour 
> bound to mention CKAN :) given that it was expressly designed for 
> holding basic metadata for open knowledge/data projects and packages:
>   http://www.ckan.net/
>> i enjoyed this walkthrough of making a semi-structured metadata registry
>> with semantic mediawiki, this one in the context of a distributed 
>> geodata repository. Thanks for writing this, Aaron. I am never sure 
>> about the
>> amount of cognitive load such a detailed syntax would impose on
>> potential contributors. But if one is committed, it is better than
> This was precisely one the reasons for going for a more web-app type 
> approach on CKAN (that plus the desire to do 'full' versioning of data 
> and the fact we got started before things like SMW were available ...).
> One of the things we want to do asap with CKAN is add support for 
> plugins that will allow people to add extra metadata in specific subject 
> areas (of course one could also write a simple extension to allow 
> 'arbitrary' metadata to be added but often having some constraints are 
> useful -- if someone is entering data related to shakespeare they don't 
> necessarily want to be asked about long/lat extents).
>> just having notes on a wiki. It seems also to be an inverted version
>> of the old public domain works wiki, initially generated by a dump
>> from a structured source. 
> Jo (in particular) I note that you've already got some listing of data 
> sources and listings on:
> http://wiki.osgeo.org/index.php/Geodata_Discovery_Working_Group
> specifically at:
> <http://wiki.osgeo.org/index.php/Geodata_Discovery_Working_Group#Existing_.28Meta.29_Search_Projects_and_Related_Efforts> 
> Would it be possible to add CKAN to the list of data catalogues there? 
> In addition I wonder if you could be persuaded :) to add a few of these 
> items into CKAN, we already have some geo related material:
> http://www.ckan.net/tag/search?search_terms=geo
> http://www.ckan.net/tag/read/geodata
> but it would be good to get more. More comments on Aaron's excellent 
> efforts inlined below.
> ~rufus
> PS: since I'm not on the geo at lists.osgeo.org would you mind forwarding 
> it there. I'd be very interested to get further responses in this thread 
>  ...
>> ----- Forwarded message from Aaron Straup Cope <asc at spum.org> -----
>> Date: Fri, 05 Oct 2007 06:29:36 -0700
>> From: Aaron Straup Cope <asc at spum.org>
>> To: geodata at lists.osgeo.org
>> Hellos,
>> I recently attended FOSS4G, in Victoria, and stopped in during the 
>> open geodata BOF.
>> One of the issues people raised was how to organize and find all of 
>> the possible data that may be housed on osgeo servers.
>> Since there is already a working instance of Mediawiki I wondered 
>> aloud whether something like the Semantic MediaWiki (SMW) extensions 
>> would be useful.
>>     http://meta.wikimedia.org/wiki/Semantic_MediaWiki
>> Let me pause briefly to just say : 1) I don't really like wikis either 
>> and 2) I am not going to rain on everyone's parade with pedantic 
>> semweb hocus pocus. No, really.
>> But.
>> The SMW stuff does make it pretty easy to add just that little bit of 
>> extra data so that you aren't living and dieing by full-text search 
>> alone and MW templates, once you suffer the initial setup, make it 
>> possible to mostly hide all of the hard stuff.
>> Both are still fraught with their own ongoing issues but they save 
>> people from having to write something from scratch and it's a 
>> reasonable 80/20 solution to the problem of making easy enough to 
>> bother entering data but detailed enough to make it worth getting it 
>> back out again.
>> Maybe.
>> Eventually someone said : It's sounds like you're volunteering. At 
>> which point it became bad form not to at least put together a proof of 
>> concept.
> That's the way to go :)
>> So here it is, with details (and bugs) below : http://proj.spum.org/
> Fast work!
>> (Also : I am not wed to any of this and I offer it up only as a 
>> suggestion. This is all stuff that I am interested in beyond any needs 
>> to index and discover open geodata so I'm not going to take my toys 
>> and leave if people decide it doesn't fit their needs.)
> I do think there is a distinction here between creating an (inevitably 
> limited but perhaps higher quality) registry and discovery. For example 
> you have Freshmeat to list open source projects but not necessarily all 
> projects are on there. At present I still think we need to have some 
> kind of registry because random discovery (using RDF tags in pages or 
> some kind of microformats) out in the wild is just too, well 'random' 
> and the metadata quality is too low -- that's why we're developing CKAN. 
> In the long run this may change (just doing CKAN i'm already amazed at 
> the amount of material ...)
>> ---
> [snip]
>> ---
>> Here's a "complicated" example :
>> # http://www.proj.spum.org/index.php?title=SomeProject
>> {{Project|Bob Exampolopolis|Mr. Nubby}}
>> == Description ==
>> This is a fuzzy project!
>> {{Tags|fuzzy|dice|muffins}}
>> == Meta ==
>> {{meta|dc|coverage|foo}}
>> # http://www.proj.spum.org/index.php?title=SomeProject_0.9
>> {{ProjectRelease|2007-09-01|cc-by-3.0}}
> This is remarkably similar to the basic metadata of CKAN -- Great minds 
> think alike ;-)
>> ---
>> In the example above tags actually get added as "dc subject" 
>> properties (as well as categoties) with all the work being hidden in 
>> the Tags template.
>> The Meta template is just a more general way to add domain specific 
>> data. Prefixes, like dc, can be registered in SMW such that they are 
>> recognized and expanded to proper URLs.
>> ---
>> Out of the box, SMW will let you search by properties. For example :
> [snip]
>> And, yes, the {{for|call}} stuff (well, actually, all of it) is a 
>> little like stabbing yourself in the eyes. That's why you hide it all 
>> in templates.
>  >
>> The <ask> stuff works great where it works. And not so much where it 
>> doesn't. For example :
> [snip]
>> Have a poke around. If you're feeling brave follow some of the 
>> templates but you may want to cry. If you're interested in playing 
>> with a related project there is also :
>>     http://grape.spum.org/
>> This one has a More Better (tm) search interface/API but only because 
>> I started to abuse the actual SMW source code. They have since updated 
>> things and I can't face whatever changes I'll need to make as a result...
>>     http://grape.spum.org/pages/HowToSearch
> This was one of the reasons we stuck with the pure webapp approach 
> (python + pylons) for CKAN -- by this point MW (or even SMW) were really 
> being using in an webapp type way. Given that they weren't really 
> designed for this our worry was that while you could go pretty fast at 
> the start you were likely to suddenly hit a serious inflection point at 
> some point.

More information about the Geodata mailing list