[GRASS-dev] Re: regex problem for r.in.wms

Hamish hamish_b at yahoo.com
Mon Mar 10 19:22:04 EDT 2008

> Hamish writes:
> > Hi, re. r.in.wms XML paring code for layers with spaces in the
> > name given some text like this:
>  > DATA="<Name>Foo Bar Baz</Name>"
>  > echo "$DATA" | sed -e "s/<Name>\s*\(\w*\)/~\1~/g" \
>  >   -e "s/<\/Name>//g"
>  > you get ~Foo~ Bar Baz
>  > instead of ~Foo Bar Baz~
>  > how to fix that regex?
> 	First of all, we expand `\w' into ``any letter or digit or the
> 	underscore character'' [1]:
> echo "$DATA" \
>     | sed -e "s/<Name>\s*\([[:alpha:][:digit:]_]*\)/~\1~/g" \
>           -e "s/<\/Name>//g"
> ## => ~Foo~ Bar Baz
> 	Then, we add `[:space:]' to the []-set:
> echo "$DATA" \
>     | sed -e "s/<Name>\s*\([[:alpha:][:digit:][:space:]_]*\)/~\1~/g"
> \
>           -e "s/<\/Name>//g"
> ## => ~Foo Bar Baz~


> 	Finally, I'd recommend to use single quotes for the Sed program,
> 	since it has no Shell substitutions contained within:

right, good idea.

> [1] GNU Sed manual (for GNU Sed 4.1.5.)

I spent a little time on this site yesterday sharpening up my regex:

[time spent learning regex is time well spent!]

and commited a fix:

In the end I replaced it with "continue until you find an open

I guess another way to do [[:alpha:][:digit:][:space:]_]* would be:

I don't see much in the the OGC's WMS spec about allowed chars,
although I didn't study it that closely.

but it does say the <Name> field is for computer to computer
communication while <Title> is the human-readable version, and gives a
multiword example <Title> with a approx six letter upperchar alpha code
for <Name>. And that is exactly what the S-57 data standard provides:

So in this case I consider that NOAA's ArcIMS server is just abusing
what the <Name> field should be, using it more as a <Title> than it

r.in.wms -l mapserv="$SERVER"

         --SUBMARINE_ON LAND PIPELINE_point   #   --<Title>

The S-57 acronym PISOL is right there to use, but ............
(no, it doesn't work to just use PISOL as the <Name>)

Otherwise I'd worry about a literal <comment> in a <Name>, and how to
match until "</Name>" not just "<". But as it is I hope no one would be
so silly as to use < in a <Name> field.

For the explicit [[:alpha:][:digit:][:space:]_]* case I worry about
possible i18n/Unicode issues?


Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 

More information about the grass-dev mailing list