[Qgis-user] Automagically remove html from attribute?

Bernd Vogelgesang bernd.vogelgesang at gmx.de
Sun Oct 7 04:21:07 PDT 2018



> Em sáb, 2018-10-06 às 17:35 +0200, Bernd Vogelgesang escreveu:
>>> Em sáb, 2018-10-06 às 12:45 +0200, Bernd Vogelgesang escreveu:
>>>> Hi,
>>>>
> ==================8<-------------- clip here --------------
>>>     A REGEXP like  "<[^>]+>" should match all contents between a
>>> consecutive pair of angle brackets.   It may be necessary to escape
>>> some of the symbols in REGEXP to avoid misinterpretation.
>>>
>>>     It is necessary to avoid REGEXP like "<.*>" because it will match
>>> everything from the first "<" to the last ">", that may include other
>>> characters "<" and ">".
>>>
>>>     HTH
>> Hi Fernando,
>> a many thanks for your hint. REGEX ist definitely the way to go, if it
>> was only a little more intuitive.
>    I hope this is not considered too offtopic.
>
>    Trying to make it a little more understandable:
>
>    The square bracket pattern ( [...] ) matches any _one_ character among
> the listed ones.   For example, if the pattern is "<[a2jZ]" it will match
> one character after the "<" if it is one of those listed.   In the given
> example it will match an angle bracket (<) followed by one of the listed
> chars:  <a or <2 or <j or <Z.   If the input string is (e.g.)
> "<jja2ZZ" the example pattern will match "<j" only.  Just to make it clear
> the content of the "[...]" pattern will match only one character.
>
>    Ok, in order to make it a little more flexible, if the first character
> in the square brackets is a caret (^) it will invert the meaning, that is
> it will match any character, except those listed.
>
>    So the used pattern "<[^>]" means one angle pattern (<) followed by any
> character except the close angle bracket (>).  As above, this matches just
> a pair of characters.
>
>    After a pattern you may use a repeat mark to make it work as much as it
> keeps matching.   For example the plus sign (+) make the prior pattern
> ([^>]) repeat as long as the character at the position is not a close
> angle bracket, provided the at least the first match is achieved.   This
> pattern will not get the sequence "<>", because the "+" demands at least
> one match.   If zero matches if an option it will be necessary to use a
> different repeater, the asterisk (*), making it "<[^>]*", this should
> match a sequence  "<>".
>
>    And we close the pattern sequence with the closing angle bracket (>).
>
>    In plain English, the complete pattern reads as:
>
>    Matches one string starting with one open angle bracket followed by any
> number of characters different from the close bracket, and ending with one
> close bracket.
Great exlanations! Thanks a lot. And I think it is not off topic. Off 
topic would be answers like "ah, but thats so easy, just use REGEX..."
>
>>    regexp_replace( "desc",'<[^>]+>','')
>>
>> in the field calculator did the trick for me for all entries with
>> correct html. So only few entries with crippled html left to process
>> manually.
>    If the crippled ones are like "<>", the exchange of "+" by "*" should do
> the trick.
>
>    HTH
>
>
>> Thanx a lot,
>> Bernd
>>
>>>> Is the e.g. a way to search for < and > and then delete them an all
>>>> text
>>>> within programmatically?
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Bernd
>>>>
>>>> _______________________________________________
>>>> Qgis-user mailing list
>>>> Qgis-user at lists.osgeo.org
>>>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>>>     Roxo
>>>
>> _______________________________________________
>> Qgis-user mailing list
>> Qgis-user at lists.osgeo.org
>> List info: https://lists.osgeo.org/mailman/listinfo/qgis-user
>> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-user
>
>    Roxo
>



More information about the Qgis-user mailing list