[gdal-dev] OGR field names encoding and SetAttributeFilter

Dennis Gocke dengo at gmx.net
Mon Apr 8 06:19:02 PDT 2013


Hi,
I've stumbled upon an issue related to the encoding of OGR field names.
Assuming OGR uses UTF8 encoding internally for not only field values but
also for field names, there is a problem with SetAttributeFilter.
I haven't found a definitive statement that UTF8 encoding is also used for
field names, but in my opinion this would only make sense and it seems to
work correctly when assuming UTF8 encoding for field names.
But there is an issue with SetAttributeFilter at least with the shapefile
driver.
I have created both an ANSI and an UTF8 encoded shapefile.
The shapefiles have the text fields ‘Name’ and ‘Äüß’ (Note that the byte
codes for ‘Äüß’ are different in ANSI and UTF8). In each shapefile I
created two features. One has set the text values to ‘vÄüß’ and the other
to ‘other’. The field names and field values are correctly returned (both
times assuming that OGR returns UTF8 encoded strings for field names and
field values). If I would use ANSI encoding for the field names I would not
get the correct field names in neither case.
So everything seems fine so far.
So next I tried setting different attribute filters, again assuming that I
need to provide the filter UTF8 encoded:
Name = ‘vÄüß’ -> no error, correctly filtered
Äüß = ‘vÄüß’ -> error (SQL Expression Parsing Error: syntax error)
The error is probably because it seems to think that no field with the name
"Äüß" exists.
Trying to set the attribute filter ANSI encoded does not work either.
I realize that using non-ASCII characters for fieldnames probably isn’t a
good idea at all, but still it would be nice if it would work.

While I was doing this test I noticed another problem with the attribute
filter. It seems when coincidently filtering by “FieldName1 = ‘FieldName2’”
it actually does “FieldName1 = FieldName2” instead. In my opinion this is
not correct.
Now it gets really weird:
Although the filter “Name = Äüß” produces a SQL Expression Parsing Error
(which is to be expected because the filter “Äüß = ..” also did), using the
filter “Name = ‘Äüß’” will work but it actually does “Name = Äüß”.
I have attached the two shapefiles.
Another small issue:
I just realized that when REPACKing a shapefile that has a non-default
encoding and therefor there is a *.cpg file, a *_repack.cpg file gets
created and is not deleted.

Although I normally use the C# wrapper I have done these tests using the
OGR C API directly, so that I had full control of the used string encodings.
Best regards,
Dennis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130408/4d2852cb/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Shape_Field_Name_Encoding.zip
Type: application/zip
Size: 1318 bytes
Desc: not available
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20130408/4d2852cb/attachment-0001.zip>


More information about the gdal-dev mailing list