<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">Le 15/07/2021 à 22:30, Nyall Dawson a
écrit :<br>
</div>
<blockquote type="cite"
cite="mid:CAB28Asj6KYZ3qha=LCUvaYJVDVPiQxAxnhgay_f1edwFO5uZDg@mail.gmail.com">
<div dir="auto">
<div><br>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Fri, 16 Jul 2021, 4:52
am Andrea Giudiceandrea, <<a
href="mailto:andreaerdna@libero.it" target="_blank"
rel="noreferrer" moz-do-not-send="true">andreaerdna@libero.it</a>>
wrote:<br>
</div>
<blockquote class="gmail_quote">> Isn't this limitation
ultimately that GDAL isn't reading the encoding<br>
> correctly? (Or perhaps it's a limitation in the
underlying freexl<br>
> library...)<br>
> <br>
> Nyall<br>
<br>
<br>
Hi Nyall and Andreas,<br>
it seems to me GDAL/OGR [1] reads XLS and XLSX files with
the relative <br>
proper encoding [2] and ogrinfo outputs the text in UTF-8
for both the <br>
formats.<br>
</blockquote>
</div>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">I don't think that's completely correct --
looking at the freexl documentation it seems that only some
xls file versions are utf8, and others have a codepage
indicating the encoding which needs to be read from the xls
metadata:</div>
<div dir="auto"><br>
</div>
<div dir="auto">"Any BIFF version from BIFF2 to BIFF5 simply
supports CodePage based character encoding, i.e. each
character
</div>
<div dir="auto">simply requires 8 bits to be represented (single
byte). Correct representation of characters requires knowing
which
</div>
<div dir="auto">one CodePage table has to be applied. This can
be determined from the workbook or worksheet metadata (it is
the
</div>
<div dir="auto">CODEPAGE record).
</div>
<div dir="auto">BIFF8 is much more sophisticated, since any text
string is usually encoded as Unicode in UTF-16 Little Endian
</div>
<div dir="auto">[UTF-16LE] format. This encoding is a multi-byte
encoding (two bytes are required to represent a single
character),
</div>
<div dir="auto">but being universal no character table is
required."</div>
</div>
</blockquote>
Yes, but FreeXL does the recoding to UTF-8<br>
<blockquote type="cite"
cite="mid:CAB28Asj6KYZ3qha=LCUvaYJVDVPiQxAxnhgay_f1edwFO5uZDg@mail.gmail.com">
<div dir="auto">
<div dir="auto"><br>
</div>
<div dir="auto">Nyall</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto"><br>
</div>
<div dir="auto">
<div class="gmail_quote">
<blockquote class="gmail_quote">
<br>
Instead, QGIS imports correctly XLSX files as UTF-8
encoded, while XLS <br>
files are wrongly imported as "system" encoded, even
selecting UTF-8 [3] <br>
encoding in the Data Source Manager vector import window.<br>
<br>
After importing a XLS file, changing the "Data source
encoding" of the <br>
layer to "UTF-8" fixes the text codecs in my tests.<br>
<br>
So, I think QGIS should automatically import also XLS
files as UTF-8 <br>
encoded.<br>
<br>
Best regards.<br>
<br>
Andrea Giudiceandrea<br>
<br>
[1] tested on Windows /OSGeo4W: GDAL 3.1.4 / FreeXL 1.0.2
/ Expat 2.1.0 <br>
and GDAL 3.2.2 / FreeXL 1.0.6 / Expat 2.2.10<br>
[2] text in XLS (BIFF8) files are internally encoded in
UTF-16LE<br>
[3] by the way, there are incorrectly two "UTF-8" codecs
listed in the <br>
"Encoding" drop down menu list...<br>
_______________________________________________<br>
QGIS-Developer mailing list<br>
<a href="mailto:QGIS-Developer@lists.osgeo.org"
rel="noreferrer noreferrer" target="_blank"
moz-do-not-send="true">QGIS-Developer@lists.osgeo.org</a><br>
List info: <a
href="https://lists.osgeo.org/mailman/listinfo/qgis-developer"
rel="noreferrer noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a><br>
Unsubscribe: <a
href="https://lists.osgeo.org/mailman/listinfo/qgis-developer"
rel="noreferrer noreferrer noreferrer" target="_blank"
moz-do-not-send="true">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a><br>
</blockquote>
</div>
</div>
</div>
<br>
<fieldset class="mimeAttachmentHeader"></fieldset>
<pre class="moz-quote-pre" wrap="">_______________________________________________
QGIS-Developer mailing list
<a class="moz-txt-link-abbreviated" href="mailto:QGIS-Developer@lists.osgeo.org">QGIS-Developer@lists.osgeo.org</a>
List info: <a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/qgis-developer">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a>
Unsubscribe: <a class="moz-txt-link-freetext" href="https://lists.osgeo.org/mailman/listinfo/qgis-developer">https://lists.osgeo.org/mailman/listinfo/qgis-developer</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
<a class="moz-txt-link-freetext" href="http://www.spatialys.com">http://www.spatialys.com</a>
My software is free, but my time generally not.</pre>
</body>
</html>