[QGIS-Developer] Loading many Excel-Files in QGIS with correct encoding

Even Rouault even.rouault at spatialys.com
Thu Jul 15 14:11:45 PDT 2021


Le 15/07/2021 à 22:30, Nyall Dawson a écrit :
>
>
> On Fri, 16 Jul 2021, 4:52 am Andrea Giudiceandrea, 
> <andreaerdna at libero.it <mailto:andreaerdna at libero.it>> wrote:
>
>     > Isn't this limitation ultimately that GDAL isn't reading the
>     encoding
>     > correctly? (Or perhaps it's a limitation in the underlying freexl
>     > library...)
>     >
>     > Nyall
>
>
>     Hi Nyall and Andreas,
>     it seems to me GDAL/OGR [1] reads XLS and XLSX files with the
>     relative
>     proper encoding [2] and ogrinfo outputs the text in UTF-8 for both
>     the
>     formats.
>
>
> I don't think that's completely correct -- looking at the freexl 
> documentation it seems that only some xls file versions are utf8, and 
> others have a codepage indicating the encoding which needs to be read 
> from the xls metadata:
>
> "Any BIFF version from BIFF2 to BIFF5 simply supports CodePage based 
> character encoding, i.e. each character
> simply requires 8 bits to be represented (single byte). Correct 
> representation of characters requires knowing which
> one CodePage table has to be applied. This can be determined from the 
> workbook or worksheet metadata (it is the
> CODEPAGE record).
> BIFF8 is much more sophisticated, since any text string is usually 
> encoded as Unicode in UTF-16 Little Endian
> [UTF-16LE] format. This encoding is a multi-byte encoding (two bytes 
> are required to represent a single character),
> but being universal no character table is required."
Yes, but FreeXL does the recoding to UTF-8
>
> Nyall
>
>
>
>
>     Instead, QGIS imports correctly XLSX files as UTF-8 encoded, while
>     XLS
>     files are wrongly imported as "system" encoded, even selecting
>     UTF-8 [3]
>     encoding in the Data Source Manager vector import window.
>
>     After importing a XLS file, changing the "Data source encoding" of
>     the
>     layer to "UTF-8" fixes the text codecs in my tests.
>
>     So, I think QGIS should automatically import also XLS files as UTF-8
>     encoded.
>
>     Best regards.
>
>     Andrea Giudiceandrea
>
>     [1] tested on Windows /OSGeo4W: GDAL 3.1.4 / FreeXL 1.0.2 / Expat
>     2.1.0
>     and GDAL 3.2.2 / FreeXL 1.0.6 / Expat 2.2.10
>     [2] text in XLS (BIFF8) files are internally encoded in UTF-16LE
>     [3] by the way, there are incorrectly two "UTF-8" codecs listed in
>     the
>     "Encoding" drop down menu list...
>     _______________________________________________
>     QGIS-Developer mailing list
>     QGIS-Developer at lists.osgeo.org <mailto:QGIS-Developer at lists.osgeo.org>
>     List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
>     <https://lists.osgeo.org/mailman/listinfo/qgis-developer>
>     Unsubscribe:
>     https://lists.osgeo.org/mailman/listinfo/qgis-developer
>     <https://lists.osgeo.org/mailman/listinfo/qgis-developer>
>
>
> _______________________________________________
> QGIS-Developer mailing list
> QGIS-Developer at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer

-- 
http://www.spatialys.com
My software is free, but my time generally not.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20210715/ab921924/attachment.html>


More information about the QGIS-Developer mailing list