[gdal-dev] A tools to change the ascii code upper 128 in shapefile
Even Rouault
even.rouault at mines-paris.org
Wed Jul 23 13:16:38 PDT 2014
Andrea,
I assume you're asking that because of encoding issues, right ? Hopefully
there are better alternatives than what you are trying to do.
1) The mapserver shapefile provider doesn't take into account the character
encoding that is written in the .dbf header or in the .cpg file. So it takes
the bytes as they appear, and output them directly (and in 6.4, the header of
the GetFeature output was ISO-8859-1 / LATIN1)
Now in MapServer master, you can indicate the encoding of the data at the
layer level ( http://mapserver.org/fr/development/rfc/ms-rfc-103.html ) and
MapServer will recode that in UTF-8. And all MapServer output headers are now
UTF-8.
2) As far as GDAL is concerned, the OGR shapefile driver will try to use the
character encoding in the .dbf header / .cpg to automatically recode to UTF-8.
But that might fail since sometimes the .dbf header indicates "system
encoding", and OGR then takes the assumption that it is LATIN1.
So if the guess of OGR is correct, when combined with MapServer master (i.e.
use CONNECTION OGR), you wouldn't neet at all to indicate the layer encoding
and everything should be output correctly in UTF-8.
But... if you still want to do your replacing of non-ASCII characters, you can
do it with OGR (and some code/script). As described in
http://gdal.org/drv_shapefile.html, you would have to set the SHAPE_ENCODING
environment variable/configuration option to "" (empty string), and that will
disable any character recoding. So you will get the bytes as they are in the
.dbf file.
Best regards,
Even
> Hi,
> Quite often the shapefile that are used to publish data on mapserver
> are filled with some characters upper 128 ascii code that will do some
> troble in the GetFeatureInfo response (in html).
>
> So I'm evaluating to start towrite some procedure to scan a shapefile
> (or a set of shapefiles) and change programmatically every code upper
> 128 ascii code with a lower 128 ascii-code.
>
> The strategy is quite simply.
> in text field: check is there is some character rather than
> alphabetica allowed (a,b,c,d.....,0,1,2,3,4,5,6,7,8,9) or some
> character like:
> .,;:-@# and so on .If there is something otherwise this
> change it with a space !
>
> I like to use the gdal-api to open and scan the shapefile, but I'm not
> sure is the better environment
> due too specifically question (detect over 128 ascii code).
>
> My question is when there is a character upper 128 ascii code,
> gdal-api return it correctly or wrong himself ?
>
> Any though ?
>
> Thx,
--
Geospatial professional services
http://even.rouault.free.fr/services.html
More information about the gdal-dev
mailing list