[gdal-dev] Heuristics to classify raster data ?
Ivan Price
Ivan.Price at noveltis.fr
Thu Mar 6 23:38:48 PST 2014
if it really is trying to tell the difference between a map and a photograph could you make a decision based on the presence of text, and therefore use an OCR mechanism to judge if there are more than x words found in the image
-i
De : gdal-dev-bounces at lists.osgeo.org [mailto:gdal-dev-bounces at lists.osgeo.org] De la part de Dmitriy Baryshnikov
Envoyé : Thursday, 6 March 2014 21:16
À : gdal-dev at lists.osgeo.org
Objet : Re: [gdal-dev] Heuristics to classify raster data ?
Hi Even,
most of all depends what kind of imagery and maps you wish to classify. If the maps are classical scanned paper maps, and you want fast algorithm - the crosses of meter or degree grid can be the good pattern.
But if we have areal images this will not work, as such images have crosses too. But satellites - not. May be some frame of maps can be good pattern.
If you have some fragment of maps and images, I think some content analysis needed:
- clustering, i.e. http://en.wikipedia.org/wiki/K-means_clustering
- Neural network with learning
- Support vector machine i.e. http://svmlight.joachims.org/ and http://en.wikipedia.org/wiki/Support_vector_machine
Also some hash comparison can be used (rather fast)
- perceptual hash compare i.e. http://www.phash.org/
In all cases input images should be resized to some small sizes and may be grayscaled or binarized before analysis.
Best regards,
Dmitry
06.03.2014 23:19, Even Rouault пишет:
Hi,
I'd be interested in an algorithm to automate the classification of raster data
between maps (let's say rendering of OpenStreetMap data, or other digital
maps) one one side and aerial/satellite imagery on the other side, without
looking at metadata (bare geotiff typically). This is to help in automating
bulk of import of data from a media and establishing a first level of
classification.
Has anyone already done that and has code and/or advice to share, or know a
software project that would do that ?
Some ideas that came to my mind :
- maps have typically a much more reduce number of colors than imagery, but
you may have imagery that has already been transformed to 256 colors to reduce
storage space.
- maps have generally a majority color (e.g. white, green), but not in all
zones (urban zones will have more features)
- maps have higher spatial frequency (lines, text) whereas imagery will be
more continuous : use of gradient, and compute statistics on it ?
Even
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20140307/c52d724d/attachment.html>
More information about the gdal-dev
mailing list