[GRASS-dev] Re: regex problem for r.in.wms

Glynn Clements glynn at gclements.plus.com
Wed Mar 12 01:20:31 EDT 2008


Hamish wrote:

> upon relflection, the greediness of regex would make the original
> '[^<]*' match until the last < on the line, not the one next found.
> (??????: to stop at the next found you would use '[^<]*?')

No; [^<] won't match a < regardless of whether the repetition is
greedy (*) or non-greedy (*?).

Non-greedy repetitions are only needed when the base pattern can also
match whatever follows the repetition. In that case, greedy
repetitions prefer to continue (matching the character(s) as part of
the repetition), while non-greedy repetitions prefer to terminate
(matching the character(s) against the following expression).

E.g. for the string aaabbb, (.*)(b+) will match with \1=aaabb,\2=b,
while (.*?)(b+) will match with \1=aaa,\2=bbb.

Also, note that non-greedy repetitions aren't portable. They exist in
PCRE, and some other regex implementations (e.g. [X]Emacs) have them. 
I don't think that they're supported by the GNU libc functions
(regcomp, regexec), and they certainly aren't specified in POSIX.

For sed, POSIX doesn't even specify \? and \+; those are GNU
extensions.

-- 
Glynn Clements <glynn at gclements.plus.com>


More information about the grass-dev mailing list