[QGIS-Developer] Loading many Excel-Files in QGIS with correct encoding

Nyall Dawson nyall.dawson at gmail.com
Thu Jul 15 13:30:33 PDT 2021


On Fri, 16 Jul 2021, 4:52 am Andrea Giudiceandrea, <andreaerdna at libero.it>
wrote:

> > Isn't this limitation ultimately that GDAL isn't reading the encoding
> > correctly? (Or perhaps it's a limitation in the underlying freexl
> > library...)
> >
> > Nyall
>
>
> Hi Nyall and Andreas,
> it seems to me GDAL/OGR [1] reads XLS and XLSX files with the relative
> proper encoding [2] and ogrinfo outputs the text in UTF-8 for both the
> formats.
>

I don't think that's completely correct -- looking at the freexl
documentation it seems that only some xls file versions are utf8, and
others have a codepage indicating the encoding which needs to be read from
the xls metadata:

"Any BIFF version from BIFF2 to BIFF5 simply supports CodePage based
character encoding, i.e. each character
simply requires 8 bits to be represented (single byte). Correct
representation of characters requires knowing which
one CodePage table has to be applied. This can be determined from the
workbook or worksheet metadata (it is the
CODEPAGE record).
BIFF8 is much more sophisticated, since any text string is usually encoded
as Unicode in UTF-16 Little Endian
[UTF-16LE] format. This encoding is a multi-byte encoding (two bytes are
required to represent a single character),
but being universal no character table is required."

Nyall




> Instead, QGIS imports correctly XLSX files as UTF-8 encoded, while XLS
> files are wrongly imported as "system" encoded, even selecting UTF-8 [3]
> encoding in the Data Source Manager vector import window.
>
> After importing a XLS file, changing the "Data source encoding" of the
> layer to "UTF-8" fixes the text codecs in my tests.
>
> So, I think QGIS should automatically import also XLS files as UTF-8
> encoded.
>
> Best regards.
>
> Andrea Giudiceandrea
>
> [1] tested on Windows /OSGeo4W: GDAL 3.1.4 / FreeXL 1.0.2 / Expat 2.1.0
> and GDAL 3.2.2 / FreeXL 1.0.6 / Expat 2.2.10
> [2] text in XLS (BIFF8) files are internally encoded in UTF-16LE
> [3] by the way, there are incorrectly two "UTF-8" codecs listed in the
> "Encoding" drop down menu list...
> _______________________________________________
> QGIS-Developer mailing list
> QGIS-Developer at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20210716/308239a4/attachment.html>


More information about the QGIS-Developer mailing list