[OSGeo-Discuss] Automatic geocoding of PDF documents

Olivier ERTZ olivier.ertz at heig-vd.ch
Wed Jan 18 11:01:42 PST 2012


Sylvain,
we had a similar need within a research project at our side. We've 
adopted such a semi-automatic approach including an automatic geotagging 
service and a kind of wizard to help the user to control what comes out 
from the service and its own geotags.

There are two parts in our solution :
- the geotagging service : it aggregates the results from several 
services like Alchemy, OpenCalais, and you can plug in the more you want 
through a simple interface
- the geotagging wizard : it's a "web widget" you can easely insert 
in/connect to your web app. It helps the user to geotag the text (view 
automatic results coming out from the service on a map, delete geotags, 
move them, choose the right place when several places are associated to 
the tag, due to the aggregation done by the service, create new geotags 
on the map manually or using a geocoding helper, ...).

We will soon publish results, demo, documentation and source code.

Best regards,
Olivier.

On 01/17/2012 04:48 PM, slesage wrote:
> El 2012-01-14 15:59, Andrew Turner escribió:
>> On Fri, Jan 13, 2012 at 6:00 PM, slesage <slesage at geo.gob.bo> wrote:
>>> Hi,
>>>
>>> does anybody knows about some opensource software dedicated to 
>>> automatic
>>> geocoding of text documents ? The idea of that "black box" would be:
>>> * give, as an input, a text document or a PDF,
>>> * receive, as an output, a list of place names with their 
>>> coordinates / a
>>> map of POI corresponding to that places.
>>>
>>> Using the geonames database (http://www.geonames.org/), the solution 
>>> appears
>>> to be only a fulltext search, that could be done using Lucene
>>> (https://lucene.apache.org/java/docs/index.html).
>>>
>>> I found the metacarta solution
>>> (http://www.metacarta.com/products-platform-geotag.htm) but couldn't 
>>> find
>>> any opensource solution.
>>
>> The reason that there isn't an open-source solution is because it is
>> Very Difficult. Even geocoding is difficult and until a short while
>> ago there weren't any decent open-source geocoders. So we worked with
>> Schuyler (formerly of Metacarta) to build an open-source one [1].
>>
>> Your idea of using Geonames gazeteer with Apache Lucene is interesting
>> and I think I've seen it suggested before. However, at best it will
>> find location names but will be missing any logic for disambiguation
>> or words or relative locations. So you could likely find that "Paris"
>> was mentioned, but not sure if it's Paris, France or Paris, Texas, US.
>>
>> Gisgraphy [2] is an open-source option that says it provides Full-text
>> searching. I don't know more about it though.
>>
>> Definitely share what else you find or try.
>>
>> Andrew
>
> Thanks for the links, Andrew, I will investigate them. I had seen 
> Gisgraphy before, but did not understand well what is its purpose 
> exactly. Did anybody use it ? It seems to be developped by only one 
> person, do you think the community is broader ?
>
> In order to refine my ideas on a geocoding tool, I think it would be 
> very difficult to do a totally automatic processing, because of 
> disambiguation and fixing of false positives/false negatives. A 
> semi-automatic approach would certainly be much more efficient, with a 
> posterior validation by the user and a learning engine to record these 
> decisions.
>
> I think that kind of processing would be most efficient interfaced as 
> a plugin for a text editor, allowing:
> * geocoding of a word selected by the user (selection -> right clic -> 
> georeference, etc.)
> * geocoding of a whole text, with a bubble for each word, and three 
> buttons for post-validation: "OK", "disambiguate" (your example of 
> Paris, Texas), "not a location"
>
> I don't know if that sounds interesting or not. But without a doubt, 
> that means a lot of development! In order not to reinvent the wheel, 
> could anybody give me more hints on the two initiatives you mentionned 
> (geocoding, gisgraphy) so I could better determine to which one it 
> would be better to contribute ?
>
> Thanks
>
> Sylvain Lesage
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
> .
>


-- 
HEIG-VD, University Of Applied Sciences Western Switzerland
IICT, Institute for Information and Communication Technologies
Email:olivier.ertz at heig-vd.ch
Phone: +41 24 55 77570
Go to:http://www.heig-vd.ch  |http://geosysin.iict.ch




More information about the Discuss mailing list