[OSGeo-Discuss] (PDF Geotagging) Re: Fwd: Discuss Digest, Vol 61, Issue 16

KITAMOTO Asanobu kitamoto at nii.ac.jp
Fri Jan 20 10:07:34 EST 2012


Hello,

I recently joined the list, and I found recent discussion about open 
source geo-tagging quite interesting and helpful, because we are also 
working on "GeoNLP" project, which aims at making a geo-tagging tool in 
an open source style. The following is the project web page.

http://agora.ex.nii.ac.jp/GeoNLP/index.html.en

For the moment, we focus on Japanese language, because Japanese is quite 
different from European languages. Roughly speaking, our task is two-fold.

1. Recognition : Morphological analyzers for Japanese were already 
developed to a satisfactory level, so we rather focus on how to 
integrate various gazetteers into standard dictionaries used for 
morphological analysis.

2. Resolution : This is a more challenging part. We basically apply some 
heuristics (such as distance between placenames and type of placenames) 
to disambiguate them, but this sometimes does not work due to 
insufficient information in text. (we don't have a post-editing 
interface though).

We want to make this system open source so that we can collaboratively 
develop algorithms and gazetteers. I always feel that many 
reinvent-the-wheel kind of hacks have been made in Japan, but at the 
same time, we still do not have a good system on which we share our 
experiences and knowledge. I hope GeoNLP is evolved to that kind of system.

We found, however, that making our system open source is more difficult 
than we first thought, because dictionaries (gazetteers) are also an 
indispensable component of the system. It seems that it's easier to 
manage them at the central repository and provide them through a Web 
service. In addition, we may need to "personalize" the choice of 
gazetteers for different purposes, because too large gazetteers tend to 
introduce words that look like general nouns. We still don't have a good 
solution for these problems.

The project is still in a pre-alpha status, and Web API is not yet open 
to the public, but API itself is already working with reasonable speed. 
We are now mapping tweets and online news on the map, and this is now a 
simple task; just setting the original text as a parameter to the API. 
An example is the mapping of tweets (sorry in Japanese).

http://agora.ex.nii.ac.jp/futtekitter/snow-or-rain/

As we are making a slow progress, we realize more about the fundamental 
difficulty of this research area, but any comments and collaborations 
are welcome.

Best Regards,
=============
Asanobu KITAMOTO
National Institute of Informatics
http://agora.ex.nii.ac.jp/~kitamoto/


(2012/01/19 21:12), Jo Walsh wrote:
> hello OSGeoids!
>
>> Thank you very much, I will try your API as soon as possible. Is there
>> a way to have access to the source code behind your web service ? Or
>> isn't it open source ?
>>
>> El 2012-01-17 12:11, James Reid escribió:
>>> You could check out our Unlock Text service at:
>>>
>>> http://unlock.edina.ac.uk/texts/introduction
>
> James nudged me in the direction of this thread.
>
> The Edinburgh Geoparser is open source *in principle* (GPL).
> In the short term the best way to get hold of a copy is to email Claire
> Grover ( grover at inf.ed.ac.uk ) of the Language Technology Group, for a
> distribution which includes both source and binaries.
>
> Bootstrapping issue; more work is needed on packaging and documentation.
> But project based funding means effort goes into new features &
> improvements rather than maintaining the core.
>
> What would *help* is a trickle of people knocking on our door and
> crucially *offering feedback* on how easy/hard the Geoparser was to get
> running, and where doc/install improvements would be most helpful.
>
> The door is ajar and now is a good time to give it a push, as LTG are
> migrating to a new cleaned up subversion repository. I would love to see
> LTG to bring the project to OSGeo Labs, they are researchers and the
> software is more of a side-effect than a product.
>
>
>
>
>
>
>
>
>
>
>



More information about the Discuss mailing list