[Benchmarking] Pgsql set up
adrian.custer at geomatys.fr
Sat Aug 7 07:25:10 EDT 2010
On Fri, 2010-08-06 at 12:24 -0700, Paul Ramsey wrote:
> I have patched shp2pgsql to allow it to swallow the illegal non-utf8
> characters, since that seems like the cleanest approach with the data
> being largely in utf8.
What does 'swallow' mean? Are we left without the label, with a
truncated (and potentially false) label, or with a label in which the
missing characters are present with some kind of spacer marker?
> The data is loading up into the main database
> right now. If you want to load your own database, you'll need to pull
> and build the latest verison of the 1.5 PostGIS branch, and set the
> UTF8_DROP_BAD_CHARACTERS to 1 in the shp2pgsql-core.c file.
> On Fri, Aug 6, 2010 at 10:28 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> > Cedric,
> > The problem is now clear to me, thanks for following up. The data in
> > the DBF files is in fact kind of corrupt. That is, it is in a mix of
> > encodings. Most of it appears to be UTF8, but there are also some
> > LATIN1 characters in there too. And LATIN1 characters are illegal in
> > UTF8. So you can try to load using UTF8 and the loader will fail when
> > it hits the illegal character. Or you can load using LATIN1, and the
> > UTF8 characters will end up as random-looking hibit characters in the
> > database.
> > Soooo.... not sure where that leaves us. I just checked the latest
> > version of shp2pgsql in SVN and we don't have support for handling
> > corrupt strings in UTF8 right now. I guess that could be my
> > contribution to the benchmarking effort. In the meanwhile, I'm pleased
> > to see OSM continuing their strong devotion to standards.
> > P.
> > On Fri, Aug 6, 2010 at 5:48 AM, Cédric Briançon
> > <cedric.briancon at geomatys.fr> wrote:
> >> Le 29/07/2010 21:04, Paul Ramsey a écrit :
> >>> Host 126.96.36.199
> >>> User postgres
> >>> Password postgres
> >>> Access only available from the local network
> >>> Tables
> >>> Building
> >>> Contour
> >>> Industry
> >>> Motorway
> >>> Point_labels_for_geometry
> >>> Point_labels_no_geometry
> >>> Ramp
> >>> Road
> >>> Settlement
> >>> Track
> >>> P.
> >>> _______________________________________________
> >>> Benchmarking mailing list
> >>> Benchmarking at lists.osgeo.org
> >>> http://lists.osgeo.org/mailman/listinfo/benchmarking
> >> Hi Paul,
> >> I've imported the shapefiles into a personal database, using the encoding
> >> LATIN1 (ISO-8859-1) as you adviced me by IRC. The import has worked, but I
> >> have strange characters in the point label tables.
> >> So I have checked on the database server for the benchmarking session, and
> >> it displays the same wrong character.
> >> If you want to check on the benchmarking server, here is a copy of my shell
> >> session:
> >> psql -U postgres
> >> postgres=# \c benchmarking
> >> psql (8.4.4)
> >> You are now connected to database "benchmarking".
> >> benchmarking=# select etiqueta from point_labels_for_geometry where gid='4';
> >> etiqueta
> >> -------------
> >> PeÃ±a Gacha
> >> (1 row)
> >> In the etiqueta field, it seems the result has wrong encoding character.
> >> I'm not a specialist on this domain, could you please check there is
> >> something wrong here?
> >> I've also tried to move my bash shell into en_US.iso88591 (with the LANG
> >> value), just to check but the result is the same for this postgresql
> >> request.
> >> I don't know if the database needs a "clientencoding" properties, maybe it
> >> can help. Or maybe the database should be in another encoding before the
> >> shp2pgsql use.
> >> This specific is the one that we request by the WMS style on this layer, so
> >> if the database has wrong characters in it, the WMS will display it as it
> >> is, that's why I have a particular interest for it.
> >> Anyway, thanks for the import into postgis.
> >> Cédric Briançon.
> Benchmarking mailing list
> Benchmarking at lists.osgeo.org
More information about the Benchmarking