[OSGeo-UK] Unlock Text

Jo Walsh jo.walsh at ed.ac.uk
Thu Jan 7 11:40:37 EST 2010


dear Andrew,

Andrew Larcombe wrote:
> This is great, thanks. What is the relevance/meaning/scale of the score, 
> clusteriness etc attributes that are returned with the places in the xml?

Right, these are byproducts of the reasoning that the "georesolver" 
stage of the geoparsing process uses to figure out, given a list of 
candidate matches for each of a set of placenames, which one is most 
likely to be "right".

The internals are a black box to me, but this is my rough understanding 
of how it works -

The places named in a document are reasonably likely to be clustered 
together. The 'clusteriness' is a measure of how close each georeference 
for a placename, is to all the other georeferences to placenames.
If there are a lot of names each with a lot of candidates, it quickly 
gets quite compute-intensive, so it tops out after the first 20 or so 
guesses for each place. The scaling is "in the range 0-1, using 
logarithmic scaling". This, along with other stats like population, is 
used to help rank the most likely guesses for locations of placenames.

Our collaborators at LTG have a paper in the works which explains the 
inner workings of the geoparser in much more depth, I hope to be able to 
link it from the site when it's publishable, and I'll send a link here.

As the clusteriness and popularity properties are already taken into 
account in the ranking order of the search results, I am not sure how 
useful it is to others that we expose them, and likely we should just 
remove them from the feeds of results.

cheers,


jo

-- 
Jo Walsh

Unlock places - http://unlock.edina.ac.uk/
phone: +44 (0)131 650 2973
skype: metazool

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



More information about the UK mailing list