[mapserver-dev] encodings

Paul Ramsey pramsey at opengeo.org
Fri May 15 12:57:36 EDT 2009


I'm with both Frank and Thomas. Rigorously handing UTF requires care.  
But a hack goes a long way. Anything < 127 is just ASCII anyways, so  
you can get a long way ignoring the problem. However, when problems  
arise, if the mapserver policy is "we handle all strings internally as  
if they are UTF8" then the roadmap to fixing the problems is clear.

WRT not needing to be explicit about encodings, I disagree. Editing  
tools on Windows (for both map files and dbf files) will tend to spit  
out WIN-1252 encoded files, with illegal (to UTF8) hibits quite  
common. Being able to set the input encoding for mapfiles and shape  
files will be key. For some data-sources, like pgsql, everything is  
trivial because the encoding issue is handled in the database library,  
but for stupider sources we are going to need to be explicit (and have  
some arguments around things like "what is the default expected  
encoding for a map file" (my guess, ISO-8859-1, or WIN-1252)

P.

On May 15, 2009, at 9:43 AM, Frank Warmerdam wrote:

> Paul Ramsey wrote:
>> Agree. Step one, a proposal that specifies the rules of the game. The
>> problem right now is the rules are pretty unclear. What is the
>> internal encoding for Mapserver? etc. Incidentally, making all
>> internal string handling UTF8 and then setting the MAP and LAYER  
>> flags
>> to indicate what the inputs are would be a nice touch.
>
> Folks,
>
> My personal opinion is that we should work towards making UTF8 the  
> internal
> representation.  It would be up to data sources to convert on the  
> fly to
> UTF-8, and when needed we could convert on output.
>
> Interesting strings in the mapfile could also be provided in utf-8.
>
> It should not be necessary to specify encoding anywhere except if some
> input datasources have no way of knowing their encoding (accurately)
> in which case perhaps there should be a mechanism to set it as a user.
>
> This was the approach taken in OGR.
>
> Thomas has noted in IRC that rigerously handling UTF-8 strings  
> requires
> careful handling of where the character boundaries are in complex  
> strings.
> My experience in OGR has been that this is seldom an issue, but then  
> i'm
> a bit of a hack and willing to take risks that others are sometimes  
> not
> willing to.
>
> Best regards,
> -- 
> --------------------------------------- 
> +--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | Geospatial Programmer for  
> Rent
>


--
Paul Ramsey
OpenGeo - http://opengeo.org
PostGIS. Because you're just that good looking.



More information about the mapserver-dev mailing list