[gdal-dev] A tools to change the ascii code upper 128 in shapefile

Andrea Peri aperi2007 at gmail.com
Thu Jul 24 00:04:46 PDT 2014


Hi Even
thx for hint.

I need to try to replace the chars because
sometime the dbf of shapefile have mixed charset inside it.

This is due to the usually practice of our shapefiles authors (many
and differents)
They very often fill the shapefile with copy/paste from other docs
(excel, word, libreroffice) and work on
windows, linux, mac.

Sometimes the same shapefile is filled from different user with
different operating systems and softwares.

The result of this activity is that some shapefiles there isnt a
single charset encoding but instead amix of them.

Sometimes the same record could have different authors in different fields.

So is really impossibile to establish what is the right charset.
We try to do instruction to the producers, but it not give any positive result.
This happen becasue often the author don't understand the question.

Every user see correctly its inserts on the own pc and it guess its all ok.

In this situation the only practical solution I see is to change all
the over 128 code chars with some predefined chars.

So thx for hint.

Regards,

Andrea.


2014-07-23 22:16 GMT+02:00 Even Rouault <even.rouault at mines-paris.org>:
> Andrea,
>
> I assume you're asking that because of encoding issues, right ? Hopefully
> there are better alternatives than what you are trying to do.
>
> 1) The mapserver shapefile provider doesn't take into account the character
> encoding that is written in the .dbf header or in the .cpg file. So it takes
> the bytes as they appear, and output them directly (and in 6.4, the header of
> the GetFeature output was ISO-8859-1 / LATIN1)
> Now in MapServer master, you can indicate the encoding of the data at the
> layer level ( http://mapserver.org/fr/development/rfc/ms-rfc-103.html ) and
> MapServer will recode that in UTF-8. And all MapServer output headers are now
> UTF-8.
>
> 2) As far as GDAL is concerned, the OGR shapefile driver will try to use the
> character encoding in the .dbf header / .cpg to automatically recode to UTF-8.
> But that might fail since sometimes the .dbf header indicates "system
> encoding", and OGR then takes the assumption that it is LATIN1.
> So if the guess of OGR is correct, when combined with MapServer master (i.e.
> use CONNECTION OGR), you wouldn't neet at all to indicate the layer encoding
> and everything should be output correctly in UTF-8.
>
> But... if you still want to do your replacing of non-ASCII characters, you can
> do it with OGR (and some code/script). As described in
> http://gdal.org/drv_shapefile.html, you would have to set the SHAPE_ENCODING
> environment variable/configuration option to "" (empty string), and that will
> disable any character recoding. So you will get the bytes as they are in the
> .dbf file.
>
> Best regards,
>
> Even
>
>> Hi,
>> Quite often the shapefile that are used to publish data on mapserver
>> are filled with some characters upper 128 ascii code that will do some
>> troble in the GetFeatureInfo response (in html).
>>
>> So I'm evaluating to start towrite some procedure to  scan a shapefile
>> (or a set of shapefiles) and change programmatically every code upper
>> 128 ascii code with a lower 128 ascii-code.
>>
>> The strategy is quite simply.
>> in text field: check is there is some character rather than
>> alphabetica allowed (a,b,c,d.....,0,1,2,3,4,5,6,7,8,9) or some
>> character like:
>> .,;:-@# and so on .If there is something otherwise this
>> change it with a space !
>>
>> I like to use the gdal-api to open and scan the shapefile, but I'm not
>> sure is the better environment
>> due too specifically question (detect over 128 ascii code).
>>
>> My question is when there is a character upper 128 ascii code,
>> gdal-api return it correctly or wrong himself ?
>>
>> Any though ?
>>
>> Thx,
>
> --
> Geospatial professional services
> http://even.rouault.free.fr/services.html



-- 
-----------------
Andrea Peri
. . . . . . . . .
qwerty àèìòù
-----------------


More information about the gdal-dev mailing list