[OSGeo-UK] Unlock Text
Jo Walsh
jo.walsh at ed.ac.uk
Thu Jan 7 11:40:37 EST 2010
dear Andrew,
Andrew Larcombe wrote:
> This is great, thanks. What is the relevance/meaning/scale of the score,
> clusteriness etc attributes that are returned with the places in the xml?
Right, these are byproducts of the reasoning that the "georesolver"
stage of the geoparsing process uses to figure out, given a list of
candidate matches for each of a set of placenames, which one is most
likely to be "right".
The internals are a black box to me, but this is my rough understanding
of how it works -
The places named in a document are reasonably likely to be clustered
together. The 'clusteriness' is a measure of how close each georeference
for a placename, is to all the other georeferences to placenames.
If there are a lot of names each with a lot of candidates, it quickly
gets quite compute-intensive, so it tops out after the first 20 or so
guesses for each place. The scaling is "in the range 0-1, using
logarithmic scaling". This, along with other stats like population, is
used to help rank the most likely guesses for locations of placenames.
Our collaborators at LTG have a paper in the works which explains the
inner workings of the geoparser in much more depth, I hope to be able to
link it from the site when it's publishable, and I'll send a link here.
As the clusteriness and popularity properties are already taken into
account in the ranking order of the search results, I am not sure how
useful it is to others that we expose them, and likely we should just
remove them from the feeds of results.
cheers,
jo
--
Jo Walsh
Unlock places - http://unlock.edina.ac.uk/
phone: +44 (0)131 650 2973
skype: metazool
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the UK
mailing list