[Benchmarking] Pgsql set up
Paul Ramsey
pramsey at cleverelephant.ca
Sat Aug 7 12:22:19 EDT 2010
Instead of the character there will be a period.
P.
On 2010-08-07, at 4:25 AM, Adrian Custer <adrian.custer at geomatys.fr> wrote:
> On Fri, 2010-08-06 at 12:24 -0700, Paul Ramsey wrote:
>> Cédric,
>> I have patched shp2pgsql to allow it to swallow the illegal non-utf8
>> characters, since that seems like the cleanest approach with the data
>> being largely in utf8.
>
>
> What does 'swallow' mean? Are we left without the label, with a
> truncated (and potentially false) label, or with a label in which the
> missing characters are present with some kind of spacer marker?
>
> --adrian
>
>> The data is loading up into the main database
>> right now. If you want to load your own database, you'll need to pull
>> and build the latest verison of the 1.5 PostGIS branch, and set the
>> UTF8_DROP_BAD_CHARACTERS to 1 in the shp2pgsql-core.c file.
>> Best,
>> Paul
>>
>> On Fri, Aug 6, 2010 at 10:28 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
>>> Cedric,
>>>
>>> The problem is now clear to me, thanks for following up. The data in
>>> the DBF files is in fact kind of corrupt. That is, it is in a mix of
>>> encodings. Most of it appears to be UTF8, but there are also some
>>> LATIN1 characters in there too. And LATIN1 characters are illegal in
>>> UTF8. So you can try to load using UTF8 and the loader will fail when
>>> it hits the illegal character. Or you can load using LATIN1, and the
>>> UTF8 characters will end up as random-looking hibit characters in the
>>> database.
>>>
>>> Soooo.... not sure where that leaves us. I just checked the latest
>>> version of shp2pgsql in SVN and we don't have support for handling
>>> corrupt strings in UTF8 right now. I guess that could be my
>>> contribution to the benchmarking effort. In the meanwhile, I'm pleased
>>> to see OSM continuing their strong devotion to standards.
>>>
>>> P.
>>>
>>> On Fri, Aug 6, 2010 at 5:48 AM, Cédric Briançon
>>> <cedric.briancon at geomatys.fr> wrote:
>>>> Le 29/07/2010 21:04, Paul Ramsey a écrit :
>>>>>
>>>>> Host 12.189.158.77
>>>>> User postgres
>>>>> Password postgres
>>>>> Access only available from the local network
>>>>> Tables
>>>>> Building
>>>>> Contour
>>>>> Industry
>>>>> Motorway
>>>>> Point_labels_for_geometry
>>>>> Point_labels_no_geometry
>>>>> Ramp
>>>>> Road
>>>>> Settlement
>>>>> Track
>>>>>
>>>>>
>>>>> P.
>>>>> _______________________________________________
>>>>> Benchmarking mailing list
>>>>> Benchmarking at lists.osgeo.org
>>>>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>>>>>
>>>>>
>>>>>
>>>>
>>>> Hi Paul,
>>>>
>>>> I've imported the shapefiles into a personal database, using the encoding
>>>> LATIN1 (ISO-8859-1) as you adviced me by IRC. The import has worked, but I
>>>> have strange characters in the point label tables.
>>>> So I have checked on the database server for the benchmarking session, and
>>>> it displays the same wrong character.
>>>>
>>>> If you want to check on the benchmarking server, here is a copy of my shell
>>>> session:
>>>>
>>>> psql -U postgres
>>>>
>>>> postgres=# \c benchmarking
>>>> psql (8.4.4)
>>>> You are now connected to database "benchmarking".
>>>> benchmarking=# select etiqueta from point_labels_for_geometry where gid='4';
>>>> etiqueta
>>>> -------------
>>>> Peña Gacha
>>>> (1 row)
>>>>
>>>> In the etiqueta field, it seems the result has wrong encoding character.
>>>> I'm not a specialist on this domain, could you please check there is
>>>> something wrong here?
>>>> I've also tried to move my bash shell into en_US.iso88591 (with the LANG
>>>> value), just to check but the result is the same for this postgresql
>>>> request.
>>>> I don't know if the database needs a "clientencoding" properties, maybe it
>>>> can help. Or maybe the database should be in another encoding before the
>>>> shp2pgsql use.
>>>>
>>>> This specific is the one that we request by the WMS style on this layer, so
>>>> if the database has wrong characters in it, the WMS will display it as it
>>>> is, that's why I have a particular interest for it.
>>>>
>>>> Anyway, thanks for the import into postgis.
>>>> Cédric Briançon.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>> _______________________________________________
>> Benchmarking mailing list
>> Benchmarking at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>>
>
>
> _______________________________________________
> Benchmarking mailing list
> Benchmarking at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/benchmarking
More information about the Benchmarking
mailing list