[gdal-dev] GDAL correlator [thoughts]

Cristian Balint cristian.balint at gmail.com
Tue Jan 22 03:46:59 PST 2013


Dear All,


 - In recent days I saw this thread:
http://lists.osgeo.org/pipermail/gdal-dev/2012-August/033897.html
subject to a feature that would be requested in GDAL. Also I am
looking at initial requirements here:
http://trac.osgeo.org/gdal/wiki/Correlator


* I walk through the thread, and found some issues and concerns. You
may find interesting some of my comments and observations:

a) Looking thorough the code proposed seems like only co-registration
of only two image is possible at once (?), and implementation uses
SURF keypoint detector.
   - SURF and also SIFT descriptors are patented algorithms, even
OpenCV decided to move out such code in a "non-free" called
sub-folder, see here:
    http://code.opencv.org/projects/opencv/repository/revisions/master/show/modules/nonfree/src
   - Their, and also any other SIFT or SURF implementation of the
original papers are considered "non-free", however there are much
better options than those two.

b) There are free alternatives such BRISK, FREAK, ORB (in order of
their quality), that operates even in hamming space instead of euclid
(provides very fast and dense match),
also have the ability to tell the exact amount of features extracted
(SIFT does not have e.g), so extraction can be surrpressed to X best
quality amount of.

c) I can't see where is FLANN matcher in the code, FLANN can provide
huge speed-up booth for euclid and hamming space descriptors, also
doing little post-filtering and validation with a small homography
calculs using RANSAC would take a big boost for further
gradient-descent steps, and it would take additional  <1 millisecond
plus to find and eliminate outliers as a post-process after keypoint
matching. Also can decide if a pair is matchable or not matchable at
all. This way also gradient-descent will be not overloaded with too
many bad outliers.

d) Regarding multi-band rasters, a good approach in case of RGB
rasters is to sum channels: (0.299*R+0.587*G+0.114*B)/3 - 0.5 (thats
exactly how e.g opencv is doing, and how some of CVision community are
doing, not sure how this recepie camed up), but very same results in
my very experience is also with plain (R+G+B)/3 , suitable with any N
band rasters, in worse case of too many raster-bands user could select
what raster bands to take in consideration at matching steps.

e) I don't got the point with "segmentation". Perhaps the idea was
born since doing SURF or SIFT detection over hi-res >50k*50k rasters
is painfully slow, but is not the case at all with BRISK or other such
descriptors. The approach to chop down rasters is not bad anyway, but
take in account that will significantly increase required matching
amount across pairs, since the rasters will increase in count.

-------

* Performance boosts can be achieved, also n-to-n rasters can be addressed:

1) To address co-registration of massive collection of datasets (not
only single pairs at once) without any prior knowledge of any
relationships will need to brute-force and test all images against all
neighbouring images, that require a minimum (n)*(n-1)/2 computation
expense, counting that for each pair-match best speed could be ~10ms
per pair (E.G using BRISK in best configuration with FLANN).
2) If prior knowledge exists (e.g. rasters have a declared coordinate
frame), then is more easy, only overlapping rasters must be checked
and registered, perhaps in much dense and deep way with more amount of
extracted features per image.
3) Regarding smooth fusion of pixels and gradient descent error
minimization OpenCV really holds very nice routines doing that (using
state of art Levenberg-Marquard minimization), also have options to
select some projection-mode. Using some additional serialization
tricks (using smaller block of rasters at once in memory, GDAL
routines can address this with ease) by avoiding memory greedy OpenCV
internal raster routines, it can go up with massive amount of pixels
(infinite) for any seamless fusion process.
4) To store computed feature points and descriptors in temporary
metadata attached a very good and elegant way found by myself is to do
that in an attached .HDF5 for each raster. HDF5 has the ability to
store multidimensional data in so called "compound" types, points and
its descriptors can be enumerated in nice and very fast way.  Many
more data's like homography matrix, or other constrains can be
attached also into the HDF5 data in a highly organized way, and also
there could be also room for small raster-bands (and any OSGEO suite
can see them in native way) for debug informations like relationship's
matrix of whole project and other usefull data for more advanced
users. I disliked any sqlite or xml metadata afer unleashing what hdf
formats can do.
5) There may be cases when so called "singleton group" will appear in
case of n-to-n collection matching, so raster images will be divided
in clusters with no relations between, in this case a segmentation
over the cross-matching matrix could reveal and identify such groups,
and OGR layer can be generated to reflect these areas.

------

- I am currently doing such work for a public 3D SfM project that
should derive 3D point-clouds from plain raster collections:
http://code.google.com/p/featsfm, the key-points detection and
matching parts are well implemented so far, if someone wants to
evaluate let me know, the final code will be released around March
anyway. Also I try to improove things in constant way, e.g. using
recent sparse descriptors like
http://cvlab.epfl.ch/research/detect/dbrief, extremely fast sparse
ones that could do an a-priori co-relation job for massive datasets (>
10k) in first step, and in second step (after relation between pairs
are determined) using e.g. BRISK to do a very deep co-matching of
known pairs. Also I am experimenting a recentish MIH matcher
http://www.cs.toronto.edu/~norouzi/research/mih instead of FLANN, it
has the ability to mach 10k keypoints against 4M keypoints within <4s
(nad is doint "exact" match not inexact like FLANN) but the drawback
is that all keypoints must be present in memory at once, so need to do
kind of "block" logic and match one image to multiple ones in single
time but also take care to not unload and reload same parts too many
times back&forth in memory (its trickier, need clear logistics to
avoid duplicate matches).
- If one wants "insane" level of post co-registration (effective
"pixel-warp", and not affine or projective ones) there are much deeper
post ways to go with special class of descriptors like DAISY in
context of some statistics (bayes/markov, some papers and matlab
implementatios where tested by me) that can do so fine co-registration
that accurate DEM's can be extracted with ease. It is possible to
co-register rasters taken at very different times e.g 2004winter
2012summer (i tried before such experiment with excelent results)
which is consiodered very challenging, in this case a difference
analisys is possible.


 - To have an basic idea, my own implementation of matcher do
something like 10k vs. 10k BRISK features using FLANN in 25ms on a
decent 2.5Ghz core, and deepest 100k vs 100k features take ~3s, very
good timings. A simple calculs having 25ms matching time on average
and 1000 rasters would be: 1000 * 999 / 2 * 25 ms = 3,46875 hours.
Suite designed by me is able to derive 3D clouds out from thousands of
hi-res UAV imagery with quite ease. Sparse DBRIEF can go way under
<1ms per match, its a new filed to explore going "sparse" ways.

- If interested we can improve in some ways described by me the
proposed GDAL tool taking easy steps, so let me know, I am willing to
contribute but need to settle the exact ways how You folks would like
to do it in deepest details. Honestly ideas are lots where we could
go.




With Respect,
-- 
Balint Cristian
e-mail: cristian.balint at gmail.com
...........................................................................


More information about the gdal-dev mailing list