[gdal-dev] Wrapper string encodings are inconsistent
Even Rouault
even.rouault at spatialys.com
Wed Mar 25 12:11:00 PDT 2026
Hi Michael,
I assume you're talking about the C# or Java bindings . In the case of
the Python bindings, given the dynamic typing of the language, the
typemap code tries to convert to UTF-8 when possible or return a bytes
if not, and is also tolerant on if it receives a Unicode string or a
bytes as input.
The issue is that even if *nominally* exchanges in the GDAL API are
supposed to be in UTF-8, there is the possibility that some drivers
might return strings in a unknown encoding. That could be CSV for
example, or shapefiles or mapinfo files whose declared encoding is not
understood by GDAL. I believe there is a ticket about the possibility of
creating 2 variants of the SWIG methods for which that could occur: one
with UTF-8, one with a binary type. Actually that might be this PR
https://github.com/OSGeo/gdal/pull/3825 that got stale.
Even
Le 25/03/2026 à 20:00, Michael via gdal-dev a écrit :
> Every function which returns char** has the "char **CSL" typemap
> applied, which causes strings in the returned array to be decoded with
> UTF-8.
>
> Every function which accepts a char** parameter has either the "char
> **options", "char **dict", or "char **dictAndCSLDestroy" typemap
> applied, which causes strings in the parameter's array to be encoded
> with UTF-8.
>
> However, many functions which return a single string value or accept
> single strings as arguments do not use UTF-8 encoding. This causes
> several inconsistencies in the wrapper's behavior.
>
> For example, many times string values from string arrays which are
> UTF-8 are used in other functions which are not UTF-8.
>
> Some examples:
> - AlgorithmRegistry.GetAlgNames() returns a string array of algorithm
> names decoded with UTF-8, but AlgorithmRegistry.InstantiateAlg(string
> algName) does not encode algName with UTF-8.
> - Algorithm.GetArgNames() returns a string array of argument names
> decoded with UTF-8, but Algorithm.GetArg(string argName) does not
> encode argName with UTF-8.
> - GeomCoordinatePrecision.GetFormats() returns a string array of
> format names decoded with UTF-8, but
> GeomCoordinatePrecision.GetFormatSpecificOptions(string formatName)
> does not encode formatName with UTF-8.
>
> Also, some functions which return a string array have related
> functions which return a single string value, but the strings in the
> array are encoded with UTF-8 while the single string values are not.
> For example, AlgorithmArg.GetAsStringList() returns an array of
> strings decoded with UTF-8, but AlgorithmArg.GetAsString() does not
> decode its returned string with UTF-8.
>
> And finally, many other string functions which accept or return
> strings not encoded with UTF-8 probably _should be UTF-8_.
>
> Some examples:
> - Any "Get*Name" function or "name" property
> - Any "Get*Description" function
> - Any "Create*", "Delete*", or "Get*" function which accepts a "*name"
> parameter
>
> Really, are there _any_ strings which _shouldn't_ be encoded with
> UTF-8? I can't find a single reason why every string passed to the
> wrapper should not be encoded as UTF-8, and no reason why every string
> retrieved from the wrapper should not be decoded with UTF-8.
>
> --
> Michael Bucari
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20260325/357ed08d/attachment.htm>
More information about the gdal-dev
mailing list