[GRASS-dev] Re: regex problem for r.in.wms
Hamish
hamish_b at yahoo.com
Mon Mar 10 19:22:04 EDT 2008
> Hamish writes:
> > Hi, re. r.in.wms XML paring code for layers with spaces in the
> > name given some text like this:
>
> > DATA="<Name>Foo Bar Baz</Name>"
> > echo "$DATA" | sed -e "s/<Name>\s*\(\w*\)/~\1~/g" \
> > -e "s/<\/Name>//g"
>
> > you get ~Foo~ Bar Baz
> > instead of ~Foo Bar Baz~
> > how to fix that regex?
>
Ivan:
> First of all, we expand `\w' into ``any letter or digit or the
> underscore character'' [1]:
>
> echo "$DATA" \
> | sed -e "s/<Name>\s*\([[:alpha:][:digit:]_]*\)/~\1~/g" \
> -e "s/<\/Name>//g"
> ## => ~Foo~ Bar Baz
>
> Then, we add `[:space:]' to the []-set:
>
> echo "$DATA" \
> | sed -e "s/<Name>\s*\([[:alpha:][:digit:][:space:]_]*\)/~\1~/g"
> \
> -e "s/<\/Name>//g"
> ## => ~Foo Bar Baz~
thanks.
> Finally, I'd recommend to use single quotes for the Sed program,
> since it has no Shell substitutions contained within:
right, good idea.
> [1] GNU Sed manual (for GNU Sed 4.1.5.)
I spent a little time on this site yesterday sharpening up my regex:
http://www.regular-expressions.info/quickstart.html
[time spent learning regex is time well spent!]
and commited a fix:
http://trac.osgeo.org/grass/changeset/30522
In the end I replaced it with "continue until you find an open
bracket":
[^<]*
I guess another way to do [[:alpha:][:digit:][:space:]_]* would be:
[\w\d\s]*
?
I don't see much in the the OGC's WMS spec about allowed chars,
although I didn't study it that closely.
http://www.opengeospatial.org/standards/wms
but it does say the <Name> field is for computer to computer
communication while <Title> is the human-readable version, and gives a
multiword example <Title> with a approx six letter upperchar alpha code
for <Name>. And that is exactly what the S-57 data standard provides:
http://www.s-57.com/
So in this case I consider that NOAA's ArcIMS server is just abusing
what the <Name> field should be, using it more as a <Title> than it
should.
example:
SERVER="http://ocs-spatial.ncd.noaa.gov/wmsconnector/com.esri.wms.Esrimap/encdirect?"
r.in.wms -l mapserv="$SERVER"
LAYER:
~SUBMARINE_ON LAND PIPELINE_point(PISOL)~ # ~<Name>~
--SUBMARINE_ON LAND PIPELINE_point # --<Title>
The S-57 acronym PISOL is right there to use, but ............
(no, it doesn't work to just use PISOL as the <Name>)
Otherwise I'd worry about a literal <comment> in a <Name>, and how to
match until "</Name>" not just "<". But as it is I hope no one would be
so silly as to use < in a <Name> field.
For the explicit [[:alpha:][:digit:][:space:]_]* case I worry about
possible i18n/Unicode issues?
Hamish
____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
More information about the grass-dev
mailing list