[okfn-discuss] Fw: [Geodata] discoverability and the wiki

Tue Oct 9 11:26:47 EDT 2007

Me and my big mouth...

I have checked out a copy of trunk, for CKAN. I will poke at it when I 
can during the month, though I doubt I will be able to give it the 
machine tag love before November.

Profit!

Rufus Pollock wrote:
> Aaron Straup Cope wrote:
>> One of the principal motivations behind using SMW for the wine site 
>> (grape.spum.org) was laziness.
>>
>> At the time (I was washing the dishes) I briefly considered it as an 
>> opportunity to play with Rails and/or Django and then quickly decided 
>> I couldn't be bothered with setting up databases and managing 
>> dependencies; both of which quickly descend in to the tedium of 
>> managing relationships and input validation. Or :
>>
>> http://www.aaronland.info/weblog/2006/12/17/meat/#papernet
> 
> Well said -- though the danger with such mods of wikis (and I speak with 
> a little experience of messing around with MoinMoin -- and MW to a much 
> lesser extent -- when thinking about e.g. CKAN) is that eventually you 
> are using them as a we web-app development toolkit which is *not* what 
> they were really designed for. However the point is taken that one wants 
> to get moving quickly.
> 
>> The decision was also influenced by an ongoing struggle about how to 
>> bridge the gap (read : chasm) between people and machines for 
>> "storing" recipes; a problem that is fantastically harder than it 
>> seems. Or :
>>
>> http://www.aaronland.info/weblog/2007/08/21/address/#doom
> 
> Yes! There is a fundamental trade-off as you put it:
> 
> "walking the line between making it easy enough for people to bother 
> putting data in to a system and still useful enough to make it worth the 
> trouble of getting it out."
> 
>> When I finally started poking around how to do stuff in MW the one 
>> thing that stunned me was, in fact, how complicated many of the 
>> articles were.
>>
>> Like anything else, it had developed its own language of 
>> specialization in the same way that people have adapted their practice 
>> (and expectation) for things like tagging in delicious. Or any wiki, 
>> for that matter. Or :
>>
>> http://www.aaronland.info/weblog/2007/02/17/platform/#wall
> 
> To repeat my point earlier in more lapidary form:
> 
> "When you starting using a swiss army knife to build a house both the 
> house and the swiss army knife suffer"
> 
>> Whether or not a registry of geodata lends itself to that kind of 
>> practice remains, of course, an open question.
>>
>> At this point, it is probably also worth pointing out that I am 
>> intimately involved with the "machine tags" work at Flickr so if my 
>> biases aren't already clear let there be no doubt :-)
>>
>> http://www.flickr.com/groups/api/discuss/72157594497877875/
>>
>> The thing about machine tags is that they are RDF by any measure. The 
>> key difference being : You don't worry about namespaces unless you 
>> want to. In a controlled environment, like the SMW, though you can 
>> simply set up a registry of known prefixes and let the computrons sort 
>> it out.
> 
> Sure. But I am not sure that 'namespace' issues are the big one. 
> Ultimately mapping from a well defined domain object in code to RDF or 
> to anything else (json/xml ...) isn't that hard. What is usually hard 
> (or perhaps time-consuming) is getting a good domain model and having 
> the good user interface (including getting good performance -- e.g. 
> because of the versioned nature of the domain model in CKAN loading 
> certain pages (fortunately not that important ones at present) take a 
> while -- I've also noticed that e.g. del.icio.us has started to get 
> quite unresponsive. These sort of things mean people 'just leave').
> 
>> So, perhaps one approach would be to simply update the CKAN to (I am 
>> happy to submit patches once I've looked at the code and my mother 
>> isn't visiting... ;-) store machine tags to allow for chunks of 
>> arbitrary domain-specific metadata, per Rufus' comment.
> 
> This I think *is* indeed a neat way to go.
> 
>> This is, in fact, really easy until you get to the search part. Or :
>>
>> http://www.aaronland.info/weblog/2007/08/24/aware/#mtdb
>>
>> And there's the rub. The search part -- not only finding, but finding 
>> relevant answers -- is always going to be the hard part because 
>> implicit in the "problem statement" (or solution) is that someone has 
>> managed to write the Do What I Mean engine.
> 
> But that leads back to the fundamental trade-off:
> 
> "More structure means harder for people to enter (so less of it) but 
> easier to find stuff and join it together in interesting ways"
> 
> Conversely
> 
> "Less structure (dare I mention 'horse=yes'!) means easier to enter data 
> but harder to find and join it together"
> 
> Depending on where your constraints are you go one way or the other 
> (e.g. if you have a bunch of librarians who will religiously use all the 
> metadata fields then go for structure but if you are hoping people will 
> just drop in off the 'net and do it you better make it damn easy to get 
> stuff in there.
> 
>> The RDF weirdos like to believe that TBL's magic layer cake of 
>> trust+proof+logic is the answer, which is madness. The Google people 
>> like to believe that their special "We're smarter than you" sauce is 
>> the answer, which is hubris. Social networking sites like to believe 
>> that your contacts have the answer, which is wishful thinking.
> 
> Indeed.
> 
>> Meanwhile the CPAN is probably the only tool that has ever managed to 
>> gracefully ("gracefully") dance around the problem; although often at 
>> the expense of needing to install half the Internet just to add 
>> support for plain text sprockets...
> 
> :)
> 
>> Which is a very long way of saying : I don't think that there's really 
>> a need to worry about "random" yet.
>>
>> It will be messy, for sure, but I tend to think it is more important 
>> to let people add data quickly and easily than it is to try to imagine 
>> how it will sort itself out in the end.
> 
> That's my feeling. though the kicker here is that one might want some 
> structure in order to have nice interfaces that let people add stuff 
> more easily. e.g. you might want to only show the geodata related stuff 
> on geodata package pages rather than the other 3000 tags people have 
> used for other types of material but maybe even this is too much!
> 
>> The sorting out is important but that's always going to be subject to 
>> both the magic (computers) and conventions (humans) of the day.
>>
>> By which I mean to say : horse=yes!
> 
> By which you mean for this kind of stuff people do enter data are the 
> constraining factor and we can work on getting info back later. I 
> basically agree and that is to some extent why CKAN is the way it is (no 
> text in RDF in a wiki stuff which you so poetically described as 
> stabbing yourself in the eyeballs ...).
> 
> ~rufus