[GRASS-dev] Re: regex problem for r.in.wms

Hamish hamish_b at yahoo.com
Mon Mar 10 19:22:04 EDT 2008


> Hamish writes:
> > Hi, re. r.in.wms XML paring code for layers with spaces in the
> > name given some text like this:
> 
>  > DATA="<Name>Foo Bar Baz</Name>"
>  > echo "$DATA" | sed -e "s/<Name>\s*\(\w*\)/~\1~/g" \
>  >   -e "s/<\/Name>//g"
> 
>  > you get ~Foo~ Bar Baz
>  > instead of ~Foo Bar Baz~
>  > how to fix that regex?
>
Ivan:
> 	First of all, we expand `\w' into ``any letter or digit or the
> 	underscore character'' [1]:
> 
> echo "$DATA" \
>     | sed -e "s/<Name>\s*\([[:alpha:][:digit:]_]*\)/~\1~/g" \
>           -e "s/<\/Name>//g"
> ## => ~Foo~ Bar Baz
> 
> 	Then, we add `[:space:]' to the []-set:
> 
> echo "$DATA" \
>     | sed -e "s/<Name>\s*\([[:alpha:][:digit:][:space:]_]*\)/~\1~/g"
> \
>           -e "s/<\/Name>//g"
> ## => ~Foo Bar Baz~

thanks.

> 	Finally, I'd recommend to use single quotes for the Sed program,
> 	since it has no Shell substitutions contained within:

right, good idea.

> [1] GNU Sed manual (for GNU Sed 4.1.5.)

I spent a little time on this site yesterday sharpening up my regex:
   http://www.regular-expressions.info/quickstart.html

[time spent learning regex is time well spent!]

and commited a fix:
  http://trac.osgeo.org/grass/changeset/30522

In the end I replaced it with "continue until you find an open
bracket":
 [^<]*

I guess another way to do [[:alpha:][:digit:][:space:]_]* would be:
 [\w\d\s]*
?

I don't see much in the the OGC's WMS spec about allowed chars,
although I didn't study it that closely.
  http://www.opengeospatial.org/standards/wms

but it does say the <Name> field is for computer to computer
communication while <Title> is the human-readable version, and gives a
multiword example <Title> with a approx six letter upperchar alpha code
for <Name>. And that is exactly what the S-57 data standard provides:
  http://www.s-57.com/

So in this case I consider that NOAA's ArcIMS server is just abusing
what the <Name> field should be, using it more as a <Title> than it
should.

example:
SERVER="http://ocs-spatial.ncd.noaa.gov/wmsconnector/com.esri.wms.Esrimap/encdirect?"
r.in.wms -l mapserv="$SERVER"

LAYER:
~SUBMARINE_ON LAND PIPELINE_point(PISOL)~     # ~<Name>~
         --SUBMARINE_ON LAND PIPELINE_point   #   --<Title>

The S-57 acronym PISOL is right there to use, but ............
(no, it doesn't work to just use PISOL as the <Name>)


Otherwise I'd worry about a literal <comment> in a <Name>, and how to
match until "</Name>" not just "<". But as it is I hope no one would be
so silly as to use < in a <Name> field.

For the explicit [[:alpha:][:digit:][:space:]_]* case I worry about
possible i18n/Unicode issues?



Hamish



      ____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 




More information about the grass-dev mailing list