[GRASS-dev] g.mlist as C implementation?

Sun Sep 9 20:07:37 EDT 2007

Markus Neteler wrote:

> when using g.mlist in mapsets with thousands of maps in
> it (as happens with MODIS time series), it gets extremely
> slow. I wonder
> 
> - if a C implementation would be faster (I usually need
>   the * wildcard to match file names)

It's certainly possible to improve upon the efficiency of g.mlist,
although there's a lot which could be done without resorting to C.

The main thing which stands out is that it invokes "grep" once for
every name returned by g.list:

    list=""
    for i in `g.list type=$type mapset=$mapset | grep -v '^-\+$' | grep -v "files available" | grep -vi "mapset"`
    do
        if [ ! "$search" ] ; then
    	list="$list $i"
        else
    	list="$list `echo $i | grep \"$search\"`"
        fi
    done

Feeding a stream of names to a single grep process would be far more
efficient. The g.list output can be "unformatted" so that each name
appears on a separate line by filtering through:

	sed 's/  */\<newline>/g'

where <newline> is a literal newline character (\n works, but appears
to be a GNU extension).

Also, use of "for var in `command` ..." is sub-optimal; using
"command | while read var ..." is preferable.

> - if yes, how complicated it is to write.

The code itself is fairly straightforward, provided that you can rely
upon the existence of either fnmatch() and/or regexec() et al. The
former is POSIX.2, the latter POSIX.1. IOW, neither are standard on
Windows.

DIY glob matching isn't hard if you impose a restriction that the
pattern may not contain more than one asterisk: check that the part
before the asterisk matches the beginning of the string, the part
after it matches the end, and the two don't overlap (i.e. the string
is at least as long as the pattern without the asterisk).

Matching regular expressions is rather more complex; you don't want to
do it yourself.

-- 
Glynn Clements <glynn at gclements.plus.com>