[gdal-dev] Re: WFS and -where with non-ASCII characters

Even Rouault even.rouault at mines-paris.org
Tue Jan 3 09:31:22 EST 2012


Selon Mateusz Łoskot <mateusz at loskot.net>:

> On 3 January 2012 13:07, Ari Jolma <ari.jolma at gmail.com> wrote:
> > On 01/03/2012 02:45 PM, Mateusz Łoskot wrote:
> >> On 3 January 2012 11:07, Jukka Rahkonen<jukka.rahkonen at mmmtike.fi>
>  wrote:
> >>>
> >>> I took the successful query sent by Ari from the TinyOWS log and copied
> >>> it literally into Windows and this way it works:
> >>>
> >>> -where name='Hämeenkylä'
> >>
> >> Windows Command Prompt can work with UTF-8 characters if you change
> >> codepage to UTF-8:
> >>
> >> 0) Open new prompt (cmd.exe)
> >> 1) Change font to Lucida Concole
> >> 3) chcp 65001
> >>
> >> And OGR can consume filter without problems:
> >>
> >> -where "name=\"Hämeenkylä\""
> >>
> >> Note, the \"\" is needed to not to confuse OGR SQL compilers,
> >> otherwise value Hämeenkylä
> >> will be parsed as OGR SQL type SNT_COLUMN instead of SNT_CONSTANT for
> >> field value.
> >
> >
> > Is that really so?
>
> I have checked the two variants under debugger and that's what I see,
> as far as I look at right place.
>
> > At least in PostgreSQL " and ' have different uses. " is
> > used for column names, which are not all lowercase and without special
> > characters and ' is used for string constants (as in this case).
>
> Perhaps parser gets confused by extended ASCII or non-ASCII characters,
> then the meaning of " and ' is affected.

The OGR SQL dialect allows " and ' to be used indifferently for string literals.
However the SQL standard (or at least the implementations I'm familiar like
sqlite, postgresql) only uses ' for string literal and " for column/table names.

I'd discourage anyone from using " for string literals with OGR SQL. Because
ultimately it would be good to be stricter in order to be able to distinguish
column names that are quoted because they contain a special character from
string literals. Currently the 2 following tests would be interpreted the same
way :

1) a_column = "a column with &( weird characters"
2) a_column = 'a literal with &( weird characters'

That is to say that in case 1) we would consider the right part of the
comparison as a literal value and not a column name.

I have created some time ago a patch to start implementing that stricter mode (
http://trac.osgeo.org/gdal/ticket/4280 ) but it can break existing uses of OGR,
so perhaps material for GDAL 2.0.


>
> Best regards,
> --
> Mateusz Loskot, http://mateusz.loskot.net
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
>




More information about the gdal-dev mailing list