[gdal-dev] Re: WFS and -where with non-ASCII characters

Rahkonen Jukka Jukka.Rahkonen at mmmtike.fi
Tue Jan 3 09:51:26 EST 2012


It starts to be really amusing to see how some characters are changing when they travel a few times between Finland, France and Poland :)

-Jukka-

> -----Alkuperäinen viesti-----
> Lähettäjä: Even Rouault [mailto:even.rouault at mines-paris.org] 
> Lähetetty: 3. tammikuuta 2012 16:44
> Vastaanottaja: Rahkonen Jukka
> Kopio: 'gdal-dev at lists.osgeo.org'
> Aihe: Re: [gdal-dev] Re: WFS and -where with non-ASCII characters
> 
> Selon Rahkonen Jukka <Jukka.Rahkonen at mmmtike.fi>:
> 
> >
> > Mateusz Łoskot wrote:
> >
> > > Jukka Rahkonen wrote:
> > > > I took the successful query sent by Ari from the TinyOWS
> > > log and copied it
> > > > literally into Windows and this way it works:
> > > >
> > > > -where name='Hämeenkylä'
> > >
> > > Windows Command Prompt can work with UTF-8 characters if 
> you change
> > > codepage to UTF-8:
> > >
> > > 0) Open new prompt (cmd.exe)
> > > 1) Change font to Lucida Concole
> > > 3) chcp 65001
> > >
> > > And OGR can consume filter without problems:
> > >
> > > -where "name=\"Hämeenkylä\""
> > >
> > > Note, the \"\" is needed to not to confuse OGR SQL compilers,
> > > otherwise value Hämeenkylä
> > > will be parsed as OGR SQL type SNT_COLUMN instead of 
> SNT_CONSTANT for
> > > field value.
> > >
> > > However, I think the problem may be with TinyOWS. It throws error;
> > >
> > > <ows:ExceptionText>QUERY_STRING contains forbidden
> > > characters</ows:ExceptionText>
> > >
> > > which is generated by TinyOWS:
> > >
> > > http://www.tinyows.org/trac/browser/trunk/src/struct/cgi_reque
> > > st.c?rev=525#L208
> > >
> > > where TinyOWS simply tests characters passed in request 
> against fixed
> > > range: A-Za-zà-ÿ
> > > Comparing extended ASII codes, the value 'ä' is outside of
> > > this range anyway.
> > >
> > > I get no WFS exception no OGR error when querying with 
> some (not all)
> > > Polish diacritics:
> > >
> > > ogrinfo WFS:http://hip.latuviitta.org/cgi-bin/tinyows
> > > lv:pks_tilastoalue_piste -where "name=\"ąęśćł\""
> > >
> > > Certainly, it gives empty resultset.
> > >
> > > I think it would be a good idea to try against different 
> WFS server.
> >
> > I followed your example but changing the font and chcp 65001 did not
> > actually change anything as fas as I can see. OGR may consume
> > -where "name=\"Hämeenkylä\"" OK but as you said but 
> TinyOWS denies it.
> > However,  -where name='Hämeenkylä' gives correct result. But
> > it gave correct result even before changing the font and codepage.
> >
> > TinyOWS log shows your -where "name=\"ąęśćł\"" like 
> "aescl" but I am not
> > sure if the characters have changed or if my console just shows them
> > as ascii characters.
> >
> > Mapserver behaves also as it did before. My codepage is now 
> 65001 and
> > -where "name=\"Hämeenkylä\"" gives http 500 error while
> > -where name='Hämeenkylä' gives correct result.
> 
> Yes, your observation confirms my little testing. Mateusz' 
> trick with chcp
> indeed fixes the display of UTF-8 characters in the console, 
> but when I enter an
> accentuated character, the command line utilities consume it 
> as Latin1.
> Note: I'm on Windows xp.
> 
> I've verified it with a trivial code compiled with MSVC :
> 
> int main(int argc, char* argv[])
> {
>    printf("%d\n", strlen(argv[1]));
>    return 0;
> }
> 
> If I try "test éven", it prints 4, whereas it should print 5 
> if it was really
> UTF-8.
> 
> >
> > -Jukka Rahkonen-
> >
> >
> >
> >
> 
> 
> 


More information about the gdal-dev mailing list