[gdal-dev] Heuristics to classify raster data ?

Thu Mar 6 23:38:48 PST 2014

if it really is trying to tell the difference between a map and a photograph could you make a decision based on the presence of text, and therefore use an OCR mechanism to judge if there are more than x words found in the image

-i

De : gdal-dev-bounces at lists.osgeo.org [mailto:gdal-dev-bounces at lists.osgeo.org] De la part de Dmitriy Baryshnikov
Envoyé : Thursday, 6 March 2014 21:16
À : gdal-dev at lists.osgeo.org
Objet : Re: [gdal-dev] Heuristics to classify raster data ?

Hi Even,

most of all depends what kind of imagery and maps you wish to classify. If the maps are classical scanned paper maps, and you want fast algorithm - the crosses of meter or degree grid can be the good pattern.
But if we have areal images this will not work, as such images have crosses too. But satellites - not. May be some frame of maps can be good pattern.

If you have some fragment of maps and images, I think some content analysis needed:
- clustering, i.e. http://en.wikipedia.org/wiki/K-means_clustering
- Neural network with learning
- Support vector machine i.e.  http://svmlight.joachims.org/ and http://en.wikipedia.org/wiki/Support_vector_machine

Also some hash comparison can be used (rather fast)
- perceptual hash compare  i.e. http://www.phash.org/

In all cases input images should be resized to some small sizes and may be grayscaled or binarized before analysis.

Best regards,

    Dmitry
06.03.2014 23:19, Even Rouault пишет:

Hi,

I'd be interested in an algorithm to automate the classification of raster data

between maps (let's say rendering of OpenStreetMap data, or other digital

maps) one one side and aerial/satellite imagery on the other side, without

looking at metadata (bare geotiff typically). This is to help in automating

bulk of import of data from a media and establishing a first level of

classification.

Has anyone already done that and has code and/or advice to share, or know a

software project that would do that ?

Some ideas that came to my mind :

- maps have typically a much more reduce number of colors than imagery, but

you may have imagery that has already been transformed to 256 colors to reduce

storage space.

- maps have generally a majority color (e.g. white, green), but not in all

zones (urban zones will have more features)

- maps have higher spatial frequency (lines, text) whereas imagery will be

more continuous : use of gradient, and compute statistics on it ?

Even

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140307/c52d724d/attachment.html>