[OSGeo-UK] Unlock Text
Andrew Larcombe
andrew at andrewlarcombe.co.uk
Mon Jan 18 18:39:45 EST 2010
On 7 Jan 2010, at 16:40, Jo Walsh wrote:
> dear Andrew,
>
> Andrew Larcombe wrote:
>> This is great, thanks. What is the relevance/meaning/scale of the
>> score, clusteriness etc attributes that are returned with the
>> places in the xml?
>
> Right, these are byproducts of the reasoning that the "georesolver"
> stage of the geoparsing process uses to figure out, given a list of
> candidate matches for each of a set of placenames, which one is
> most likely to be "right".
>
> The internals are a black box to me, but this is my rough
> understanding of how it works -
>
> The places named in a document are reasonably likely to be
> clustered together. The 'clusteriness' is a measure of how close
> each georeference for a placename, is to all the other
> georeferences to placenames.
> If there are a lot of names each with a lot of candidates, it
> quickly gets quite compute-intensive, so it tops out after the
> first 20 or so guesses for each place. The scaling is "in the range
> 0-1, using logarithmic scaling". This, along with other stats like
> population, is used to help rank the most likely guesses for
> locations of placenames.
Hi Jo,
Thanks for the reply. I understand some of what is involved in
carrying out geoparsing tasks. Having looked again at the output, I'm
particularly interested in the score ranking. The document I uploaded
had two places mentioned, Newcastle upon Tyne which was given a score
of ~2.09 and Durham which scored ~1.96 (see xml snippet below) What
I'm trying to do is to ascertain how these scores are scaled so that
I can understand how best to identify and deal with high and low
scoring places in my application.
Cheers,
Andrew
#####
<placenames>
−
<placename name="Newcastle upon Tyne" id="1">
<place rank="1" score="2.085720778" scaled_contained_by="0"
scaled_contains="0" scaled_near="0" pop="192382" name="Newcastle upon
Tyne" gazref="geonames:2641673" type="ppl" lat="54.9732787391176"
long="-1.61396026611328" in-cc="GB" clusteriness="21.98994879"
scaled_clusteriness="0.828887891" clusteriness_rank="1"
scaled_pop="0.6568328871" scaled_type="0.6"/>
<place rank="2" score="1.196748007" scaled_contained_by="0"
scaled_contains="0" scaled_near="0" name="City and Borough of
Newcastle upon Tyne" gazref="geonames:3333174" type="civil"
lat="55.0" long="-1.6666667" in-cc="GB" clusteriness="25.49787491"
scaled_clusteriness="0.7967480069" clusteriness_rank="2"
scaled_pop="0" scaled_type="0.4"/>
</placename>
−
<placename name="Durham" id="2">
<place rank="1" score="1.960816214" scaled_contained_by="0"
scaled_contains="0" scaled_near="0" pop="45696" name="Durham"
gazref="geonames:2650628" type="ppl" lat="54.776762936846"
long="-1.57565832138062" in-cc="GB" clusteriness="21.99474071"
scaled_clusteriness="0.8288405767" clusteriness_rank="2"
scaled_pop="0.5319756372" scaled_type="0.6"/>
#####
--
Andrew Larcombe
Freelance Geospatial, Database & Web Programming
web: http://www.andrewlarcombe.co.uk : http://blog.andrewl.net
email: andrew at andrewlarcombe.co.uk
mob: +44 (7760) 258623
icq: 306690163
More information about the UK
mailing list