[gdal-dev] A tools to change the ascii code upper 128 in shapefile

Even Rouault even.rouault at mines-paris.org
Wed Jul 23 13:16:38 PDT 2014


Andrea,

I assume you're asking that because of encoding issues, right ? Hopefully 
there are better alternatives than what you are trying to do.

1) The mapserver shapefile provider doesn't take into account the character 
encoding that is written in the .dbf header or in the .cpg file. So it takes 
the bytes as they appear, and output them directly (and in 6.4, the header of 
the GetFeature output was ISO-8859-1 / LATIN1)
Now in MapServer master, you can indicate the encoding of the data at the 
layer level ( http://mapserver.org/fr/development/rfc/ms-rfc-103.html ) and 
MapServer will recode that in UTF-8. And all MapServer output headers are now 
UTF-8.

2) As far as GDAL is concerned, the OGR shapefile driver will try to use the 
character encoding in the .dbf header / .cpg to automatically recode to UTF-8. 
But that might fail since sometimes the .dbf header indicates "system 
encoding", and OGR then takes the assumption that it is LATIN1.
So if the guess of OGR is correct, when combined with MapServer master (i.e. 
use CONNECTION OGR), you wouldn't neet at all to indicate the layer encoding 
and everything should be output correctly in UTF-8.

But... if you still want to do your replacing of non-ASCII characters, you can 
do it with OGR (and some code/script). As described in 
http://gdal.org/drv_shapefile.html, you would have to set the SHAPE_ENCODING 
environment variable/configuration option to "" (empty string), and that will 
disable any character recoding. So you will get the bytes as they are in the 
.dbf file.

Best regards,

Even

> Hi,
> Quite often the shapefile that are used to publish data on mapserver
> are filled with some characters upper 128 ascii code that will do some
> troble in the GetFeatureInfo response (in html).
> 
> So I'm evaluating to start towrite some procedure to  scan a shapefile
> (or a set of shapefiles) and change programmatically every code upper
> 128 ascii code with a lower 128 ascii-code.
> 
> The strategy is quite simply.
> in text field: check is there is some character rather than
> alphabetica allowed (a,b,c,d.....,0,1,2,3,4,5,6,7,8,9) or some
> character like:
> .,;:-@# and so on .If there is something otherwise this
> change it with a space !
> 
> I like to use the gdal-api to open and scan the shapefile, but I'm not
> sure is the better environment
> due too specifically question (detect over 128 ascii code).
> 
> My question is when there is a character upper 128 ascii code,
> gdal-api return it correctly or wrong himself ?
> 
> Any though ?
> 
> Thx,

-- 
Geospatial professional services
http://even.rouault.free.fr/services.html


More information about the gdal-dev mailing list