[Benchmarking] Pgsql set up

Paul Ramsey pramsey at cleverelephant.ca
Sat Aug 7 12:22:19 EDT 2010


Instead of the character there will be a period.

P.

On 2010-08-07, at 4:25 AM, Adrian Custer <adrian.custer at geomatys.fr> wrote:

> On Fri, 2010-08-06 at 12:24 -0700, Paul Ramsey wrote:
>> Cédric,
>> I have patched shp2pgsql to allow it to swallow the illegal non-utf8
>> characters, since that seems like the cleanest approach with the data
>> being largely in utf8.
> 
> 
> What does 'swallow' mean? Are we left without the label, with a
> truncated (and potentially false) label, or with a label in which the
> missing characters are present with some kind of spacer marker?
> 
> --adrian
> 
>> The data is loading up into the main database
>> right now. If you want to load your own database, you'll need to pull
>> and build the latest verison of the 1.5 PostGIS branch, and set the
>> UTF8_DROP_BAD_CHARACTERS to 1 in the shp2pgsql-core.c file.
>> Best,
>> Paul
>> 
>> On Fri, Aug 6, 2010 at 10:28 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>>> Cedric,
>>> 
>>> The problem is now clear to me, thanks for following up. The data in
>>> the DBF files is in fact kind of corrupt. That is, it is in a mix of
>>> encodings. Most of it appears to be UTF8, but there are also some
>>> LATIN1 characters in there too. And LATIN1 characters are illegal in
>>> UTF8. So you can try to load using UTF8 and the loader will fail when
>>> it hits the illegal character. Or you can load using LATIN1, and the
>>> UTF8 characters will end up as random-looking hibit characters in the
>>> database.
>>> 
>>> Soooo.... not sure where that leaves us. I just checked the latest
>>> version of shp2pgsql in SVN and we don't have support for handling
>>> corrupt strings in UTF8 right now. I guess that could be my
>>> contribution to the benchmarking effort. In the meanwhile, I'm pleased
>>> to see OSM continuing their strong devotion to standards.
>>> 
>>> P.
>>> 
>>> On Fri, Aug 6, 2010 at 5:48 AM, Cédric Briançon
>>> <cedric.briancon at geomatys.fr> wrote:
>>>> Le 29/07/2010 21:04, Paul Ramsey a écrit :
>>>>> 
>>>>> Host 12.189.158.77
>>>>> User postgres
>>>>> Password postgres
>>>>> Access only available from the local network
>>>>> Tables
>>>>>  Building
>>>>>  Contour
>>>>>  Industry
>>>>>  Motorway
>>>>>  Point_labels_for_geometry
>>>>>  Point_labels_no_geometry
>>>>>  Ramp
>>>>>  Road
>>>>>  Settlement
>>>>>  Track
>>>>> 
>>>>> 
>>>>> P.
>>>>> _______________________________________________
>>>>> Benchmarking mailing list
>>>>> Benchmarking at lists.osgeo.org
>>>>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> Hi Paul,
>>>> 
>>>> I've imported the shapefiles into a personal database, using the encoding
>>>> LATIN1 (ISO-8859-1) as you adviced me by IRC. The import has worked, but I
>>>> have strange characters in the point label tables.
>>>> So I have checked on the database server for the benchmarking session, and
>>>> it displays the same wrong character.
>>>> 
>>>> If you want to check on the benchmarking server, here is a copy of my shell
>>>> session:
>>>> 
>>>> psql -U postgres
>>>> 
>>>> postgres=# \c benchmarking
>>>> psql (8.4.4)
>>>> You are now connected to database "benchmarking".
>>>> benchmarking=# select etiqueta from point_labels_for_geometry where gid='4';
>>>> etiqueta
>>>> -------------
>>>> Peña Gacha
>>>> (1 row)
>>>> 
>>>> In the etiqueta field, it seems the result has wrong encoding character.
>>>> I'm not a specialist on this domain, could you please check there is
>>>> something wrong here?
>>>> I've also tried to move my bash shell into en_US.iso88591 (with the LANG
>>>> value), just to check but the result is the same for this postgresql
>>>> request.
>>>> I don't know if the database needs a "clientencoding" properties, maybe it
>>>> can help. Or maybe the database should be in another encoding before the
>>>> shp2pgsql use.
>>>> 
>>>> This specific is the one that we request by the WMS style on this layer, so
>>>> if the database has wrong characters in it, the WMS will display it as it
>>>> is, that's why I have a particular interest for it.
>>>> 
>>>> Anyway, thanks for the import into postgis.
>>>> Cédric Briançon.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>> _______________________________________________
>> Benchmarking mailing list
>> Benchmarking at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>> 
> 
> 
> _______________________________________________
> Benchmarking mailing list
> Benchmarking at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/benchmarking


More information about the Benchmarking mailing list