[OSGeo-Discuss] Automatic geocoding of PDF documents

Andrew Turner ajturner at highearthorbit.com
Sat Jan 14 11:59:11 PST 2012


On Fri, Jan 13, 2012 at 6:00 PM, slesage <slesage at geo.gob.bo> wrote:
> Hi,
>
> does anybody knows about some opensource software dedicated to automatic
> geocoding of text documents ? The idea of that "black box" would be:
> * give, as an input, a text document or a PDF,
> * receive, as an output, a list of place names with their coordinates / a
> map of POI corresponding to that places.
>
> Using the geonames database (http://www.geonames.org/), the solution appears
> to be only a fulltext search, that could be done using Lucene
> (https://lucene.apache.org/java/docs/index.html).
>
> I found the metacarta solution
> (http://www.metacarta.com/products-platform-geotag.htm) but couldn't find
> any opensource solution.

The reason that there isn't an open-source solution is because it is
Very Difficult. Even geocoding is difficult and until a short while
ago there weren't any decent open-source geocoders. So we worked with
Schuyler (formerly of Metacarta) to build an open-source one [1].

Your idea of using Geonames gazeteer with Apache Lucene is interesting
and I think I've seen it suggested before. However, at best it will
find location names but will be missing any logic for disambiguation
or words or relative locations. So you could likely find that "Paris"
was mentioned, but not sure if it's Paris, France or Paris, Texas, US.

Gisgraphy [2] is an open-source option that says it provides Full-text
searching. I don't know more about it though.

Definitely share what else you find or try.

Andrew


[1] https://github.com/geocommons/geocoder
[2] http://www.gisgraphy.com/download/index.htm

>
> Thanks for your suggestions.
>
> Sylvain Lesage.
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss



-- 
Andrew Turner
mobile: 248.982.3609
andrew at fortiusone.com
http://highearthorbit.com

http://geocommons.com           Helping build the Geospatial Web
Introduction to Neogeography - http://oreilly.com/catalog/neogeography



More information about the Discuss mailing list