<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
  </head>
  <body>
    <p><br>
    </p>
    <div class="moz-cite-prefix">Le 15/07/2021 à 22:30, Nyall Dawson a
      écrit :<br>
    </div>
    <blockquote type="cite"
cite="mid:CAB28Asj6KYZ3qha=LCUvaYJVDVPiQxAxnhgay_f1edwFO5uZDg@mail.gmail.com">
      <div dir="auto">
        <div><br>
          <br>
          <div class="gmail_quote">
            <div dir="ltr" class="gmail_attr">On Fri, 16 Jul 2021, 4:52
              am Andrea Giudiceandrea, <<a
                href="mailto:andreaerdna@libero.it" target="_blank"
                rel="noreferrer" moz-do-not-send="true">andreaerdna@libero.it</a>>
              wrote:<br>
            </div>
            <blockquote class="gmail_quote">> Isn't this limitation
              ultimately that GDAL isn't reading the encoding<br>
              > correctly? (Or perhaps it's a limitation in the
              underlying freexl<br>
              > library...)<br>
              > <br>
              > Nyall<br>
              <br>
              <br>
              Hi Nyall and Andreas,<br>
              it seems to me GDAL/OGR [1] reads XLS and XLSX files with
              the relative <br>
              proper encoding [2] and ogrinfo outputs the text in UTF-8
              for both the <br>
              formats.<br>
            </blockquote>
          </div>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">I don't think that's completely correct --
          looking at the freexl documentation it seems that only some
          xls file versions are utf8, and others have a codepage
          indicating the encoding which needs to be read from the xls
          metadata:</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">"Any BIFF version from BIFF2 to BIFF5 simply
          supports CodePage based character encoding, i.e. each
          character
        </div>
        <div dir="auto">simply requires 8 bits to be represented (single
          byte). Correct representation of characters requires knowing
          which
        </div>
        <div dir="auto">one CodePage table has to be applied. This can
          be determined from the workbook or worksheet metadata (it is
          the
        </div>
        <div dir="auto">CODEPAGE record).
        </div>
        <div dir="auto">BIFF8 is much more sophisticated, since any text
          string is usually encoded as Unicode in UTF-16 Little Endian
        </div>
        <div dir="auto">[UTF-16LE] format. This encoding is a multi-byte
          encoding (two bytes are required to represent a single
          character),
        </div>
        <div dir="auto">but being universal no character table is
          required."</div>
      </div>
    </blockquote>
    Yes, but FreeXL does the recoding to UTF-8<br>
    <blockquote type="cite"
cite="mid:CAB28Asj6KYZ3qha=LCUvaYJVDVPiQxAxnhgay_f1edwFO5uZDg@mail.gmail.com">
      <div dir="auto">
        <div dir="auto"><br>
        </div>
        <div dir="auto">Nyall</div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto"><br>
        </div>
        <div dir="auto">
          <div class="gmail_quote">
            <blockquote class="gmail_quote">
              <br>
              Instead, QGIS imports correctly XLSX files as UTF-8
              encoded, while XLS <br>
              files are wrongly imported as "system" encoded, even
              selecting UTF-8 [3] <br>
              encoding in the Data Source Manager vector import window.<br>
              <br>
              After importing a XLS file, changing the "Data source
              encoding" of the <br>
              layer to "UTF-8" fixes the text codecs in my tests.<br>
              <br>
              So, I think QGIS should automatically import also XLS
              files as UTF-8 <br>
              encoded.<br>
              <br>
              Best regards.<br>
              <br>
              Andrea Giudiceandrea<br>
              <br>
              [1] tested on Windows /OSGeo4W: GDAL 3.1.4 / FreeXL 1.0.2
              / Expat 2.1.0 <br>
              and GDAL 3.2.2 / FreeXL 1.0.6 / Expat 2.2.10<br>
              [2] text in XLS (BIFF8) files are internally encoded in
              UTF-16LE<br>
              [3] by the way, there are incorrectly two "UTF-8" codecs
              listed in the <br>
              "Encoding" drop down menu list...<br>
              _______________________________________________<br>
              QGIS-Developer mailing list<br>
              <a href="mailto:QGIS-Developer@lists.osgeo.org"
                rel="noreferrer noreferrer" target="_blank"
                moz-do-not-send="true">QGIS-Developer@lists.osgeo.org</a><br>
              List info: <a
                href="https://lists.osgeo.org/mailman/listinfo/qgis-developer"
                rel="noreferrer noreferrer noreferrer" target="_blank"
                moz-do-not-send="true">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a><br>
              Unsubscribe: <a
                href="https://lists.osgeo.org/mailman/listinfo/qgis-developer"
                rel="noreferrer noreferrer noreferrer" target="_blank"
                moz-do-not-send="true">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a><br>
            </blockquote>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <pre class="moz-quote-pre" wrap="">_______________________________________________
QGIS-Developer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:QGIS-Developer@lists.osgeo.org">QGIS-Developer@lists.osgeo.org</a>
List info: <a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/qgis-developer">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a>
Unsubscribe: <a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/qgis-developer">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a>
</pre>
    </blockquote>
    <pre class="moz-signature" cols="72">-- 
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
  </body>
</html>