[OSGeo-Discuss] Automatic geocoding of PDF documents

Bob Basques Bob.Basques at ci.stpaul.mn.us
Tue Jan 17 09:24:15 PST 2012


All, 

I did something similar to this a couple of years ago (for fun of all things) where I parsed a Craigslist listing, and used the city location information (I used a Census Placename SHP file I think) to plot a location for each item for sale on a map.  The plan at the time was to mapify Craigslist and be able to do geo-filtered queries.  I gotrlb.sharedgeo.o it to work, but got onto other things and never went back to it. 

I used PERL, and as I recall, it wasn't that long of a script, maybe 50 lines or so.  I can look for the code if there is interest. 

bobb 



>>> Stephen Woodbridge <woodbri at swoodbridge.com> wrote:


Here are some more links that you might find useful.

http://www.biomedcentral.com/1471-2105/10/385
http://www.ijcte.org/papers/005.pdf
http://www.e-perimetron.org/Vol_4_1/Martins_et_al.pdf
http://nltk.googlecode.com/svn/trunk/doc/book/ch07.html

I can not find a potentially excelent reference that was done by about 5
years ago by maybe a GSoC student that I think then hired on with
Google, but the basically he wrote a document parser that looked for
location references in the text and then tagged the document with
locations and lat/longs. If I remember correctly it as a gazetteer based
system and it is open source and was online somewhere also.

-Steve

On 1/13/2012 6:00 PM, slesage wrote:
> Hi,
>
> does anybody knows about some opensource software dedicated to automatic
> geocoding of text documents ? The idea of that "black box" would be:
> * give, as an input, a text document or a PDF,
> * receive, as an output, a list of place names with their coordinates /
> a map of POI corresponding to that places.
>
> Using the geonames database (http://www.geonames.org/), the solution
> appears to be only a fulltext search, that could be done using Lucene
> (https://lucene.apache.org/java/docs/index.html).
>
> I found the metacarta solution
> (http://www.metacarta.com/products-platform-geotag.htm) but couldn't
> find any opensource solution.
>
> Thanks for your suggestions.
>
> Sylvain Lesage.
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss

_______________________________________________
Discuss mailing list
Discuss at lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/discuss/attachments/20120117/5f55ef5a/attachment-0002.html>


More information about the Discuss mailing list