[gdal-dev] ocr(opus) and ogr

Carlo A. Bertelli (Charta s.r.l.) bertelli at charta.acme.com
Sun May 5 23:43:12 PDT 2013


hello,
I had a fairly clean map with street codes, a very good candidate for
optical character recognition, so I tried to recognize it with some of the
available ocr engines and applications, just to try what if.
To my surprise, ocropus (https://code.google.com/p/ocropus/) got something
useful. It's output is an xhtml file with pixel based *coordinates* (may I
add some exclamative marks here?).
Here is an example:
   ‹span class="ocr_line" title="bbox 6309 5042 6465 5085"›506V‹/span›
I hope I could do something with it.
One great thing could be writing a specialized ocroscript (which is the
command ocropus uses). Ocropus is written in python, so it shouldn't be
impossible.
But even with search and replace I could obtain a reasonable csv/xml file.
The problem is still having to deal with pixel based coordinates. I think
this could be solved with some proj magic, to feed it to ogr2ogr and
voilà...
Being able to deal with pixel based coordinates could enable us to use
basic raster to vector conversion, which is not unuseful.
Could someone help me?
c

-- 
--------------------------------------------------------------------------
Carlo A. Bertelli
   Charta servizi e sistemi per il territorio e la storia ambientale srl
          Dipendenze del palazzo Doria,
          vc. alla Chiesa della Maddalena 9/2 16124      Genova (Italy)
          tel. +39(0)10 2475439  fax +39(0)10 2475439  gsm:+39 393 1590711
--------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130506/d384113f/attachment.html>


More information about the gdal-dev mailing list