[Qgis-user] Automagically remove html from attribute?
Fernando M. Roxo da Motta
petro at roxo.org
Sat Oct 6 14:16:10 PDT 2018
Em sáb, 2018-10-06 às 17:35 +0200, Bernd Vogelgesang escreveu:
> > Em sáb, 2018-10-06 às 12:45 +0200, Bernd Vogelgesang escreveu:
> > > Hi,
> > >
==================8<-------------- clip here --------------
> > A REGEXP like "<[^>]+>" should match all contents between a
> > consecutive pair of angle brackets. It may be necessary to escape
> > some of the symbols in REGEXP to avoid misinterpretation.
> >
> > It is necessary to avoid REGEXP like "<.*>" because it will match
> > everything from the first "<" to the last ">", that may include other
> > characters "<" and ">".
> >
> > HTH
>
> Hi Fernando,
> a many thanks for your hint. REGEX ist definitely the way to go, if it
> was only a little more intuitive.
I hope this is not considered too offtopic.
Trying to make it a little more understandable:
The square bracket pattern ( [...] ) matches any _one_ character among
the listed ones. For example, if the pattern is "<[a2jZ]" it will match
one character after the "<" if it is one of those listed. In the given
example it will match an angle bracket (<) followed by one of the listed
chars: <a or <2 or <j or <Z. If the input string is (e.g.)
"<jja2ZZ" the example pattern will match "<j" only. Just to make it clear
the content of the "[...]" pattern will match only one character.
Ok, in order to make it a little more flexible, if the first character
in the square brackets is a caret (^) it will invert the meaning, that is
it will match any character, except those listed.
So the used pattern "<[^>]" means one angle pattern (<) followed by any
character except the close angle bracket (>). As above, this matches just
a pair of characters.
After a pattern you may use a repeat mark to make it work as much as it
keeps matching. For example the plus sign (+) make the prior pattern
([^>]) repeat as long as the character at the position is not a close
angle bracket, provided the at least the first match is achieved. This
pattern will not get the sequence "<>", because the "+" demands at least
one match. If zero matches if an option it will be necessary to use a
different repeater, the asterisk (*), making it "<[^>]*", this should
match a sequence "<>".
And we close the pattern sequence with the closing angle bracket (>).
In plain English, the complete pattern reads as:
Matches one string starting with one open angle bracket followed by any
number of characters different from the close bracket, and ending with one
close bracket.
>
> regexp_replace( "desc",'<[^>]+>','')
>
> in the field calculator did the trick for me for all entries with
> correct html. So only few entries with crippled html left to process
> manually.
If the crippled ones are like "<>", the exchange of "+" by "*" should do
the trick.
HTH
>
> Thanx a lot,
> Bernd
>
> >
> > > Is the e.g. a way to search for < and > and then delete them an all
> > > text
> > > within programmatically?
> > >
> > >
> > > Cheers,
> > >
> > > Bernd
> > >
> > > _______________________________________________
> > > Qgis-user mailing list
> > > Qgis-user at lists.osgeo.org
> > > List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
> > > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
> >
> > Roxo
> >
>
> _______________________________________________
> Qgis-user mailing list
> Qgis-user at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
Roxo
--
---------------- Non luctari, ludare -------------------+ WYSIWYG
Fernando M. Roxo da Motta <petro at roxo.org> | Editor?
Except where explicitly stated I speak on my own behalf.| VI !!
PU5RXO | I see text,
------------ Quis custodiet ipsos custodes?-------------+ I get text!
More information about the Qgis-user
mailing list