[OSGeo-Discuss] (PDF Geotagging) Re: Fwd: Discuss Digest, Vol
61, Issue 16
kitamoto at nii.ac.jp
Fri Jan 20 10:07:34 EST 2012
I recently joined the list, and I found recent discussion about open
source geo-tagging quite interesting and helpful, because we are also
working on "GeoNLP" project, which aims at making a geo-tagging tool in
an open source style. The following is the project web page.
For the moment, we focus on Japanese language, because Japanese is quite
different from European languages. Roughly speaking, our task is two-fold.
1. Recognition : Morphological analyzers for Japanese were already
developed to a satisfactory level, so we rather focus on how to
integrate various gazetteers into standard dictionaries used for
2. Resolution : This is a more challenging part. We basically apply some
heuristics (such as distance between placenames and type of placenames)
to disambiguate them, but this sometimes does not work due to
insufficient information in text. (we don't have a post-editing
We want to make this system open source so that we can collaboratively
develop algorithms and gazetteers. I always feel that many
reinvent-the-wheel kind of hacks have been made in Japan, but at the
same time, we still do not have a good system on which we share our
experiences and knowledge. I hope GeoNLP is evolved to that kind of system.
We found, however, that making our system open source is more difficult
than we first thought, because dictionaries (gazetteers) are also an
indispensable component of the system. It seems that it's easier to
manage them at the central repository and provide them through a Web
service. In addition, we may need to "personalize" the choice of
gazetteers for different purposes, because too large gazetteers tend to
introduce words that look like general nouns. We still don't have a good
solution for these problems.
The project is still in a pre-alpha status, and Web API is not yet open
to the public, but API itself is already working with reasonable speed.
We are now mapping tweets and online news on the map, and this is now a
simple task; just setting the original text as a parameter to the API.
An example is the mapping of tweets (sorry in Japanese).
As we are making a slow progress, we realize more about the fundamental
difficulty of this research area, but any comments and collaborations
National Institute of Informatics
(2012/01/19 21:12), Jo Walsh wrote:
> hello OSGeoids!
>> Thank you very much, I will try your API as soon as possible. Is there
>> a way to have access to the source code behind your web service ? Or
>> isn't it open source ?
>> El 2012-01-17 12:11, James Reid escribiÃ³:
>>> You could check out our Unlock Text service at:
> James nudged me in the direction of this thread.
> The Edinburgh Geoparser is open source *in principle* (GPL).
> In the short term the best way to get hold of a copy is to email Claire
> Grover ( grover at inf.ed.ac.uk ) of the Language Technology Group, for a
> distribution which includes both source and binaries.
> Bootstrapping issue; more work is needed on packaging and documentation.
> But project based funding means effort goes into new features &
> improvements rather than maintaining the core.
> What would *help* is a trickle of people knocking on our door and
> crucially *offering feedback* on how easy/hard the Geoparser was to get
> running, and where doc/install improvements would be most helpful.
> The door is ajar and now is a good time to give it a push, as LTG are
> migrating to a new cleaned up subversion repository. I would love to see
> LTG to bring the project to OSGeo Labs, they are researchers and the
> software is more of a side-effect than a product.
More information about the Discuss