[Qgis-user] Automagically remove html from attribute?

Fernando M. Roxo da Motta petro at roxo.org
Sat Oct 6 14:16:10 PDT 2018


Em sáb, 2018-10-06 às 17:35 +0200, Bernd Vogelgesang escreveu:
> > Em sáb, 2018-10-06 às 12:45 +0200, Bernd Vogelgesang escreveu:
> > > Hi,
> > > 
==================8<-------------- clip here --------------
> >    A REGEXP like  "<[^>]+>" should match all contents between a
> > consecutive pair of angle brackets.   It may be necessary to escape
> > some of the symbols in REGEXP to avoid misinterpretation.
> > 
> >    It is necessary to avoid REGEXP like "<.*>" because it will match
> > everything from the first "<" to the last ">", that may include other
> > characters "<" and ">".
> > 
> >    HTH
> 
> Hi Fernando,
> a many thanks for your hint. REGEX ist definitely the way to go, if it 
> was only a little more intuitive.

  I hope this is not considered too offtopic.

  Trying to make it a little more understandable:

  The square bracket pattern ( [...] ) matches any _one_ character among
the listed ones.   For example, if the pattern is "<[a2jZ]" it will match
one character after the "<" if it is one of those listed.   In the given
example it will match an angle bracket (<) followed by one of the listed
chars:  <a or <2 or <j or <Z.   If the input string is (e.g.) 
"<jja2ZZ" the example pattern will match "<j" only.  Just to make it clear
the content of the "[...]" pattern will match only one character.

  Ok, in order to make it a little more flexible, if the first character
in the square brackets is a caret (^) it will invert the meaning, that is
it will match any character, except those listed.

  So the used pattern "<[^>]" means one angle pattern (<) followed by any
character except the close angle bracket (>).  As above, this matches just
a pair of characters.

  After a pattern you may use a repeat mark to make it work as much as it
keeps matching.   For example the plus sign (+) make the prior pattern
([^>]) repeat as long as the character at the position is not a close
angle bracket, provided the at least the first match is achieved.   This
pattern will not get the sequence "<>", because the "+" demands at least
one match.   If zero matches if an option it will be necessary to use a
different repeater, the asterisk (*), making it "<[^>]*", this should
match a sequence  "<>".

  And we close the pattern sequence with the closing angle bracket (>).

  In plain English, the complete pattern reads as:

  Matches one string starting with one open angle bracket followed by any
number of characters different from the close bracket, and ending with one
close bracket.

> 
>   regexp_replace( "desc",'<[^>]+>','')
> 
> in the field calculator did the trick for me for all entries with 
> correct html. So only few entries with crippled html left to process 
> manually.

  If the crippled ones are like "<>", the exchange of "+" by "*" should do
the trick.

  HTH


> 
> Thanx a lot,
> Bernd
> 
> > 
> > > Is the e.g. a way to search for < and > and then delete them an all
> > > text
> > > within programmatically?
> > > 
> > > 
> > > Cheers,
> > > 
> > > Bernd
> > > 
> > > _______________________________________________
> > > Qgis-user mailing list
> > > Qgis-user at lists.osgeo.org
> > > List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
> > > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
> > 
> >    Roxo
> > 
> 
> _______________________________________________
> Qgis-user mailing list
> Qgis-user at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user


  Roxo

-- 
---------------- Non luctari, ludare -------------------+ WYSIWYG
Fernando M. Roxo da Motta <petro at roxo.org>              | Editor?
Except where explicitly stated I speak on my own behalf.|  VI !!
                PU5RXO                                  | I see text,
------------ Quis custodiet ipsos custodes?-------------+ I get text!
 


More information about the Qgis-user mailing list