[GRASS-dev] anyone interested in helping to port an algorithm from matlab?

Dylan Beaudette dylan.beaudette at gmail.com
Thu Nov 6 01:52:15 EST 2008


On Wed, Nov 5, 2008 at 7:39 PM, Hamish <hamish_b at yahoo.com> wrote:
> Dylan Beaudette wrote:
>> # accumulate new quadrants
>> # WTF ?
>> cc = [cc;cord]
>> Qc=[Qc;Qh]
>>
>> if cc is matrix, and cord is a matrix, what exactly is
>> happening?
>
> "coord" is appended to the end of "cc". e.g.:
>
>>> a = [ 10 20 ]
>>> b = [ 15 25 ]
>>> c = [a; b]
> c =
>
>    10    20
>    15    25
>
>>> d = [ a; b; a ]
> d =
>
>    10    20
>    15    25
>    10    20
>
>
>> I have a sneaking suspicion that this algorithm is not very efficient
>> (the fortran version), as it takes quite a while (well 3 minutes) to
>> process a 400x400 grid.
>
> in the above example, appending the variable has the effect of resizing
> the array, which is very very slow. Whenever possible it is better to
> determine/guess the final size of the array and create it (full of NaNs)
> and populate it row by row. "cc = [cc; next]" will get slower and slower
> as the iterations grow. You can crop off any leftover NaNs once finished.

I thought that this was what was happening, but could not verify it.
In R this would be accomplished by 'row-binding' things together with
rbind(). Same problem in R when dynamically re-sizing an array/matrix-
usually the same strategy for speeding it up. I haven't run the
program in Octave/Matlab, so I am not sure how it performs compared to
the fortran version... probably a lot slower.

> many "for" loops in matlab can be converted to array calcs greatly
> speeding things up. if things are done right, nested for loops should
> be a rare sight.

Ditto for R-- vectorized approaches are usually preferred. If I ever
did finish porting this thing to R, I would try and convert all of the
for() loops into *apply() statements and the like.

> matlab disk/sscanf i/o has traditionally been much much slower than C,
> although I think this has somewhat improved with newer versions (or it
> just seems that way because the CPUs get faster and faster?). I usually
> try to use UNIX tools (sed/awk/...) to pre-process ascii data before
> load()ing into matlab. you can use !cmd or system() calls in matlab to
> do that within the script.

OK-- good tips. I think that this algorithm really belongs in
something like C, as it is not much more than a simple set of wrappers
to a recursive partitioning approach.

>
> The Fortran compiler you use can greatly affect run time. Using something
> modern?
>

I was only able to compile it with f95 -- not sure if that is just an
alias for something in the gcc collection or not.

In thinking about how to implement this as a GRASS function, I wonder
if code could be borrowed / shared with the segment library-- which
essentially builds a quadtree for the partitioning of vector points.
That approach might be able to be modified to partition the region
based on rectangles of equal variance, instead of equal number of
points. I may contact Helena and see what she thinks.

Cheers,

Dylan


>
> Hamish
>
>
>
>
>
>


More information about the grass-dev mailing list