[Benchmarking] Pgsql set up

Adrian Custer adrian.custer at geomatys.fr
Mon Aug 9 04:01:36 EDT 2010


On Sat, 2010-08-07 at 09:22 -0700, Paul Ramsey wrote:
> Instead of the character there will be a period.
> 
> P.
> 

Very nice. Thanks for that work!

--adrian

> On 2010-08-07, at 4:25 AM, Adrian Custer <adrian.custer at geomatys.fr> wrote:
> 
> > On Fri, 2010-08-06 at 12:24 -0700, Paul Ramsey wrote:
> >> Cédric,
> >> I have patched shp2pgsql to allow it to swallow the illegal non-utf8
> >> characters, since that seems like the cleanest approach with the data
> >> being largely in utf8.
> > 
> > 
> > What does 'swallow' mean? Are we left without the label, with a
> > truncated (and potentially false) label, or with a label in which the
> > missing characters are present with some kind of spacer marker?
> > 
> > --adrian
> > 
> >> The data is loading up into the main database
> >> right now. If you want to load your own database, you'll need to pull
> >> and build the latest verison of the 1.5 PostGIS branch, and set the
> >> UTF8_DROP_BAD_CHARACTERS to 1 in the shp2pgsql-core.c file.
> >> Best,
> >> Paul
> >> 
> >> On Fri, Aug 6, 2010 at 10:28 AM, Paul Ramsey <pramsey at cleverelephant.ca> wrote:
> >>> Cedric,
> >>> 
> >>> The problem is now clear to me, thanks for following up. The data in
> >>> the DBF files is in fact kind of corrupt. That is, it is in a mix of
> >>> encodings. Most of it appears to be UTF8, but there are also some
> >>> LATIN1 characters in there too. And LATIN1 characters are illegal in
> >>> UTF8. So you can try to load using UTF8 and the loader will fail when
> >>> it hits the illegal character. Or you can load using LATIN1, and the
> >>> UTF8 characters will end up as random-looking hibit characters in the
> >>> database.
> >>> 
> >>> Soooo.... not sure where that leaves us. I just checked the latest
> >>> version of shp2pgsql in SVN and we don't have support for handling
> >>> corrupt strings in UTF8 right now. I guess that could be my
> >>> contribution to the benchmarking effort. In the meanwhile, I'm pleased
> >>> to see OSM continuing their strong devotion to standards.
> >>> 
> >>> P.
> >>> 
> >>> On Fri, Aug 6, 2010 at 5:48 AM, Cédric Briançon
> >>> <cedric.briancon at geomatys.fr> wrote:
> >>>> Le 29/07/2010 21:04, Paul Ramsey a écrit :
> >>>>> 
> >>>>> Host 12.189.158.77
> >>>>> User postgres
> >>>>> Password postgres
> >>>>> Access only available from the local network
> >>>>> Tables
> >>>>>  Building
> >>>>>  Contour
> >>>>>  Industry
> >>>>>  Motorway
> >>>>>  Point_labels_for_geometry
> >>>>>  Point_labels_no_geometry
> >>>>>  Ramp
> >>>>>  Road
> >>>>>  Settlement
> >>>>>  Track
> >>>>> 
> >>>>> 
> >>>>> P.
> >>>>> _______________________________________________
> >>>>> Benchmarking mailing list
> >>>>> Benchmarking at lists.osgeo.org
> >>>>> http://lists.osgeo.org/mailman/listinfo/benchmarking
> >>>>> 
> >>>>> 
> >>>>> 
> >>>> 
> >>>> Hi Paul,
> >>>> 
> >>>> I've imported the shapefiles into a personal database, using the encoding
> >>>> LATIN1 (ISO-8859-1) as you adviced me by IRC. The import has worked, but I
> >>>> have strange characters in the point label tables.
> >>>> So I have checked on the database server for the benchmarking session, and
> >>>> it displays the same wrong character.
> >>>> 
> >>>> If you want to check on the benchmarking server, here is a copy of my shell
> >>>> session:
> >>>> 
> >>>> psql -U postgres
> >>>> 
> >>>> postgres=# \c benchmarking
> >>>> psql (8.4.4)
> >>>> You are now connected to database "benchmarking".
> >>>> benchmarking=# select etiqueta from point_labels_for_geometry where gid='4';
> >>>> etiqueta
> >>>> -------------
> >>>> Peña Gacha
> >>>> (1 row)
> >>>> 
> >>>> In the etiqueta field, it seems the result has wrong encoding character.
> >>>> I'm not a specialist on this domain, could you please check there is
> >>>> something wrong here?
> >>>> I've also tried to move my bash shell into en_US.iso88591 (with the LANG
> >>>> value), just to check but the result is the same for this postgresql
> >>>> request.
> >>>> I don't know if the database needs a "clientencoding" properties, maybe it
> >>>> can help. Or maybe the database should be in another encoding before the
> >>>> shp2pgsql use.
> >>>> 
> >>>> This specific is the one that we request by the WMS style on this layer, so
> >>>> if the database has wrong characters in it, the WMS will display it as it
> >>>> is, that's why I have a particular interest for it.
> >>>> 
> >>>> Anyway, thanks for the import into postgis.
> >>>> Cédric Briançon.
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>>> 
> >>> 
> >> _______________________________________________
> >> Benchmarking mailing list
> >> Benchmarking at lists.osgeo.org
> >> http://lists.osgeo.org/mailman/listinfo/benchmarking
> >> 
> > 
> > 
> > _______________________________________________
> > Benchmarking mailing list
> > Benchmarking at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/benchmarking
> _______________________________________________
> Benchmarking mailing list
> Benchmarking at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/benchmarking
> 




More information about the Benchmarking mailing list