[OSGeo-UK] Unlock Text

Andrew Larcombe andrew at andrewlarcombe.co.uk
Mon Jan 18 18:39:45 EST 2010


On 7 Jan 2010, at 16:40, Jo Walsh wrote:

> dear Andrew,
>
> Andrew Larcombe wrote:
>> This is great, thanks. What is the relevance/meaning/scale of the  
>> score, clusteriness etc attributes that are returned with the  
>> places in the xml?
>
> Right, these are byproducts of the reasoning that the "georesolver"  
> stage of the geoparsing process uses to figure out, given a list of  
> candidate matches for each of a set of placenames, which one is  
> most likely to be "right".
>
> The internals are a black box to me, but this is my rough  
> understanding of how it works -
>
> The places named in a document are reasonably likely to be  
> clustered together. The 'clusteriness' is a measure of how close  
> each georeference for a placename, is to all the other  
> georeferences to placenames.
> If there are a lot of names each with a lot of candidates, it  
> quickly gets quite compute-intensive, so it tops out after the  
> first 20 or so guesses for each place. The scaling is "in the range  
> 0-1, using logarithmic scaling". This, along with other stats like  
> population, is used to help rank the most likely guesses for  
> locations of placenames.



Hi Jo,

Thanks for the reply. I understand some of what is involved in  
carrying out geoparsing tasks. Having looked again at the output, I'm  
particularly interested in the score ranking. The document I uploaded  
had two places mentioned, Newcastle upon Tyne which was given a score  
of ~2.09 and Durham which scored ~1.96 (see xml snippet below) What  
I'm trying to do is to ascertain how these scores are scaled so that  
I can understand how best to identify and deal with high and low  
scoring places in my application.

Cheers,

Andrew



#####
<placenames>
−
<placename name="Newcastle upon Tyne" id="1">
<place rank="1" score="2.085720778" scaled_contained_by="0"  
scaled_contains="0" scaled_near="0" pop="192382" name="Newcastle upon  
Tyne" gazref="geonames:2641673" type="ppl" lat="54.9732787391176"  
long="-1.61396026611328" in-cc="GB" clusteriness="21.98994879"  
scaled_clusteriness="0.828887891" clusteriness_rank="1"  
scaled_pop="0.6568328871" scaled_type="0.6"/>
<place rank="2" score="1.196748007" scaled_contained_by="0"  
scaled_contains="0" scaled_near="0" name="City and Borough of  
Newcastle upon Tyne" gazref="geonames:3333174" type="civil"  
lat="55.0" long="-1.6666667" in-cc="GB" clusteriness="25.49787491"  
scaled_clusteriness="0.7967480069" clusteriness_rank="2"  
scaled_pop="0" scaled_type="0.4"/>
</placename>
−
<placename name="Durham" id="2">
<place rank="1" score="1.960816214" scaled_contained_by="0"  
scaled_contains="0" scaled_near="0" pop="45696" name="Durham"  
gazref="geonames:2650628" type="ppl" lat="54.776762936846"  
long="-1.57565832138062" in-cc="GB" clusteriness="21.99474071"  
scaled_clusteriness="0.8288405767" clusteriness_rank="2"  
scaled_pop="0.5319756372" scaled_type="0.6"/>
#####

-- 
Andrew Larcombe
Freelance Geospatial, Database & Web Programming

web: http://www.andrewlarcombe.co.uk : http://blog.andrewl.net
email: andrew at andrewlarcombe.co.uk
mob: +44 (7760) 258623
icq: 306690163







More information about the UK mailing list