[OSGeo-Discuss] Automatic geocoding of PDF documents
Stark Hans-Jörg
hansjoerg.stark at fhnw.ch
Sat Jan 14 14:05:09 PST 2012
perhaps OpenAddresses (www.openaddresses.org) may also be helpful. it is far from being complete yet but for some regions the data is fairly dense (and if donated complete) - and: it provides geocoding as rest services (see the wiki).
cheers,
hj
Am 14.01.2012 um 20:59 schrieb "Andrew Turner" <ajturner at highearthorbit.com>:
> On Fri, Jan 13, 2012 at 6:00 PM, slesage <slesage at geo.gob.bo> wrote:
>> Hi,
>>
>> does anybody knows about some opensource software dedicated to automatic
>> geocoding of text documents ? The idea of that "black box" would be:
>> * give, as an input, a text document or a PDF,
>> * receive, as an output, a list of place names with their coordinates / a
>> map of POI corresponding to that places.
>>
>> Using the geonames database (http://www.geonames.org/), the solution appears
>> to be only a fulltext search, that could be done using Lucene
>> (https://lucene.apache.org/java/docs/index.html).
>>
>> I found the metacarta solution
>> (http://www.metacarta.com/products-platform-geotag.htm) but couldn't find
>> any opensource solution.
>
> The reason that there isn't an open-source solution is because it is
> Very Difficult. Even geocoding is difficult and until a short while
> ago there weren't any decent open-source geocoders. So we worked with
> Schuyler (formerly of Metacarta) to build an open-source one [1].
>
> Your idea of using Geonames gazeteer with Apache Lucene is interesting
> and I think I've seen it suggested before. However, at best it will
> find location names but will be missing any logic for disambiguation
> or words or relative locations. So you could likely find that "Paris"
> was mentioned, but not sure if it's Paris, France or Paris, Texas, US.
>
> Gisgraphy [2] is an open-source option that says it provides Full-text
> searching. I don't know more about it though.
>
> Definitely share what else you find or try.
>
> Andrew
>
>
> [1] https://github.com/geocommons/geocoder
> [2] http://www.gisgraphy.com/download/index.htm
>
>>
>> Thanks for your suggestions.
>>
>> Sylvain Lesage.
>> _______________________________________________
>> Discuss mailing list
>> Discuss at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/discuss
>
>
>
> --
> Andrew Turner
> mobile: 248.982.3609
> andrew at fortiusone.com
> http://highearthorbit.com
>
> http://geocommons.com Helping build the Geospatial Web
> Introduction to Neogeography - http://oreilly.com/catalog/neogeography
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
More information about the Discuss
mailing list