[Benchmarking] Pgsql set up

Paul Ramsey pramsey at cleverelephant.ca
Fri Aug 6 15:24:13 EDT 2010


Cédric,
I have patched shp2pgsql to allow it to swallow the illegal non-utf8
characters, since that seems like the cleanest approach with the data
being largely in utf8. The data is loading up into the main database
right now. If you want to load your own database, you'll need to pull
and build the latest verison of the 1.5 PostGIS branch, and set the
UTF8_DROP_BAD_CHARACTERS to 1 in the shp2pgsql-core.c file.
Best,
Paul

On Fri, Aug 6, 2010 at 10:28 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> Cedric,
>
> The problem is now clear to me, thanks for following up. The data in
> the DBF files is in fact kind of corrupt. That is, it is in a mix of
> encodings. Most of it appears to be UTF8, but there are also some
> LATIN1 characters in there too. And LATIN1 characters are illegal in
> UTF8. So you can try to load using UTF8 and the loader will fail when
> it hits the illegal character. Or you can load using LATIN1, and the
> UTF8 characters will end up as random-looking hibit characters in the
> database.
>
> Soooo.... not sure where that leaves us. I just checked the latest
> version of shp2pgsql in SVN and we don't have support for handling
> corrupt strings in UTF8 right now. I guess that could be my
> contribution to the benchmarking effort. In the meanwhile, I'm pleased
> to see OSM continuing their strong devotion to standards.
>
> P.
>
> On Fri, Aug 6, 2010 at 5:48 AM, Cédric Briançon
> <cedric.briancon at geomatys.fr> wrote:
>> Le 29/07/2010 21:04, Paul Ramsey a écrit :
>>>
>>> Host 12.189.158.77
>>> User postgres
>>> Password postgres
>>> Access only available from the local network
>>> Tables
>>>   Building
>>>   Contour
>>>   Industry
>>>   Motorway
>>>   Point_labels_for_geometry
>>>   Point_labels_no_geometry
>>>   Ramp
>>>   Road
>>>   Settlement
>>>   Track
>>>
>>>
>>> P.
>>> _______________________________________________
>>> Benchmarking mailing list
>>> Benchmarking at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>>>
>>>
>>>
>>
>> Hi Paul,
>>
>> I've imported the shapefiles into a personal database, using the encoding
>> LATIN1 (ISO-8859-1) as you adviced me by IRC. The import has worked, but I
>> have strange characters in the point label tables.
>> So I have checked on the database server for the benchmarking session, and
>> it displays the same wrong character.
>>
>> If you want to check on the benchmarking server, here is a copy of my shell
>> session:
>>
>> psql -U postgres
>>
>> postgres=# \c benchmarking
>> psql (8.4.4)
>> You are now connected to database "benchmarking".
>> benchmarking=# select etiqueta from point_labels_for_geometry where gid='4';
>>  etiqueta
>> -------------
>>  Peña Gacha
>> (1 row)
>>
>> In the etiqueta field, it seems the result has wrong encoding character.
>> I'm not a specialist on this domain, could you please check there is
>> something wrong here?
>> I've also tried to move my bash shell into en_US.iso88591 (with the LANG
>> value), just to check but the result is the same for this postgresql
>> request.
>> I don't know if the database needs a "clientencoding" properties, maybe it
>> can help. Or maybe the database should be in another encoding before the
>> shp2pgsql use.
>>
>> This specific is the one that we request by the WMS style on this layer, so
>> if the database has wrong characters in it, the WMS will display it as it
>> is, that's why I have a particular interest for it.
>>
>> Anyway, thanks for the import into postgis.
>> Cédric Briançon.
>>
>>
>>
>>
>>
>


More information about the Benchmarking mailing list