[OSGeo-Discuss] The existence (and value of) "clean" geocoding tools?

Andrew Turner ajturner at highearthorbit.com
Fri Sep 26 04:51:02 PDT 2008


It seems as though the "where is a good geocoding engine" typically
devolves into either "you need data", or "it's tough, and here's an
explicit explanation why". I'm surprised that there are rarely answers
(or projects) that say, "here's a project, it needs data, but just get
it into this form, and it has these shortcomings but here's how to
configure it".

The 2002 Google Code contest is a good start, and so is the PostGIS
based one. SRC open-sourced a C++ one, but I've heard mixed reviews.
Just started playing with it myself:

http://www.extendthereach.com/products/OSGeocoder.srct

Anyways, seems like there is a severe need for a good, supported
geocoder. It's a major missing piece in the Open-Source Geo stack.

Andrew


On Thu, Sep 25, 2008 at 7:44 AM, Stephen Woodbridge
<woodbri at swoodbridge.com> wrote:
> David Dearing wrote:
>>
>> Hi.  I just recently stumbled across OSGeo and have poked around to try
>> and get a feel for the different projects, but still have a lingering
>> question.  Forgive me if this isn't the appropriate channel to be asking
>> this.
>>
>> It seems that there is a solid focus on mapping, image manipulation, and
>> geometric processing at OSGeo.  And, in the more broad world including
>> non-open source projects, there are a lot of tools available for the mass
>> production of geotagged or geocoded documents.  However, the accuracy of
>> these systems, while good, doesn't seem sufficient when accuracy is at a
>> premium (from what I've seen they tend to focus on volume).
>>
>> Are there any existing tools that can be used to tag/code documents,
>> perhaps sacrificing the mass-produced aspect for better accuracy?  Have I
>> just missed/overlooked some existing tool(s) that meet this description?
>>  Or, am I in the minority in wanting to produce fewer "clean"
>> geocoded/tagged documents rather than many "pretty good" documents?
>
> Have you looked at http://ofb.net/~egnor/google.html
> http://www.pagcgeo.org/
>
>
> Geocoding is NOT exact, in fact it deals with a very messy area of natural
> language parsing. While it is constrained more than free text, it still has
> to deal with all the issues of typos, abbreviations, punctuations, etc and
> then it has to match the user into to some vendor data.
>
> For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44, Highway
> 44, State Highway 44, Rt 44, and various other abbreviations for Highway,
> simple typo errors, adding N, N., North, S, S., South, etc designations to
> the Highway, adding Alt., Bus., Byp., etc and on it goes. You also need to
> deal with accented characters, that are sometimes entered without accents.
>
> In a geocoder, you typically have a standardizer that sort our all that
> craziness. Then when you load the geocoder, you standardize the vendor data
> and store it in a standard form. When you get a geocode request you
> standardize the incoming request and then try to match the standard form
> with the vendor data which is also in standard form. As an alternative to a
> standardizer some geocoders use statistical record match techniques.
>
> You can also you techniques like metaphone/soundex codes to do fuzzy
> searching and then use levensthein distance to score the possible matched
> results for how close they are to the request.
>
> You need to be prepared to handle multiple results to a query, for example
> you search for Oak St. but only find North Oak Street and South Oak Street.
>
> And all this can only happen after you have tagged some text in a document
> if you are doing tagging. You mention accuracy is important, well how do you
> determine what is "right", remember the Oak St example above.
>
> Anyway this is a good place to discuss this topic.
>
> -Stephen Woodbridge
>  http://imaptools.com/
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>



-- 
Andrew Turner
mobile: 248.982.3609
andrew at mapufacture.com
http://highearthorbit.com

http://mapufacture.com           Helping build the Geospatial Web
Introduction to Neogeography - http://oreilly.com/catalog/neogeography



More information about the Discuss mailing list