[postgis-users] "Linux" geocoder script ?

Paragon Corporation lr at pcorp.us
Tue Apr 12 23:32:45 PDT 2011


Don,
 
Which state were you processing?  I can check it out and see if I get
similar errors on my shp2pgsql.  You could be right and the file just isn't
Latin1.
 
The regress test did seem to pass for me once that ticket was fixed.
 
Also to confirm you are running the latest version of shp2pgsql 
 
If you run 
 
shp2pgsql from commandline, it should output the version.  Mine for example
reads
 
RELEASE: 2.0 USE_GEOS=1 USE_PROJ=1 USE_STATS=1 (r$Id: shp2pgsql-core.h 6925
2011-03-18 16:24:33Z pramsey $)

The version unfortunately isn't quite accurate since its evidentally looking
at the .h file instead of .c file.  So though my version says 6925, its
really
6932 or later. 
 
http://trac.osgeo.org/postgis/changeset/6932
 
Hope that helps,
Regina
http://www.postgis.us
 
 
  _____  

From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Don
Sent: Tuesday, April 12, 2011 3:08 AM
To: PostGIS Users Discussion
Subject: Re: [postgis-users] "Linux" geocoder script ?


My database is encoded as
 geocoder  | drh      | UTF8     | C         | en_US.UTF-8 | .
All my shp2pgsql statements have the -W option like this.
${loader}  -a -s 4269 -g the_geom -W "latin1" $z
${staging_schema}.${state_abbrev}_${table_name} | $PGBIN/psql -d
$PGDATABASE;

Here is the bug that I was referring to.
http://trac.osgeo.org/postgis/ticket/808
In one case I had a very large number of inserts processed for the shape
file and then got that error.

>From your link it says:
"To enable automatic character set conversion, you have to tell PostgreSQL
the character set (encoding) you would like to use in the client. There are
several ways to accomplish this: "
Perhaps I need to use 

SET CLIENT_ENCODING TO 'value'; in psql or is shp2pgsql supposed to do that
when I use the -W option?

postgis is expecting utf-8 when it should be expecting  latin1 and
converting it to utf-8.

Could data type for a column have some effect on this?

 

On 04/11/2011 08:52 PM, Sylvain Racine wrote: 

Hello, 

This is not a shp2pgsql bug. You get this error when you try to insert
string data in PostgreSQL from another encoding that the one of your
database Ex: Your data is formatted in Latin1 (ISO-8859-1) and you insert
them in a UTF-8 database. To fix the error message, you need to convert your
data. 

PostgreSQL have a internal converter. shp2pgsql have it too. Try shp2pgsql
-W <encoding> where <encoding> is the format of you DBase file .dbf. This is
called the "client encoding" in PostgreSQL. See list of valid encoding type:

http://www.postgresql.org/docs/9.0/static/multibyte.html 

Don't mix it with the database encoding. It is the one you us to create your
databse. There is also a default database charset, depending of your OS. It
is the one you use to create template1 database in init-db.  Mine is "UTF8"
on Ubuntu. 

Hope that this information will help you 

Regards 

Sylvain Racine 

On 2011-04-11 21:22, Don wrote: 


I have got the tiger2010 geodecoder to work on my Opensuse system. 
geocoder=# 
geocoder=# SELECT g.rating, 
geocoder-#         ST_X(geomout) As lon, 
geocoder-#         ST_Y(geomout) As lat, (addy).* 
geocoder-# FROM geocode('1731 New Hampshire Avenue Northwest, Washington, DC
20010') As g; 
 rating |        lon        |       lat        | address | predirabbrev |
streetname   | streettypeabbrev | postdirabbrev | internal |  location  |
stateabbrev |  zip  | parsed 
--------+-------------------+------------------+---------+--------------+---
------------+------------------+---------------+----------+------------+----
---------+-------+-------- 
      0 | -77.0399013800607 | 38.9134181361424 |    1731 |              |
New Hampshire | Ave              | NW            |          | Washington |
DC          | 20009 | t 
(1 row) 
There are a few glitches.  I noticed that I am getting this message
sometimes. 
INSERT 0 1 
INSERT 0 1 
INSERT 0 1 
INSERT 0 1 
ERROR:  invalid byte sequence for encoding "UTF8": 0xed6f20 
HINT:  This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding". 
ERROR:  current transaction is aborted, commands ignored until end of
transaction block 
ERROR:  current transaction is aborted, commands ignored until end of
transaction block 
ERROR:  current transaction is aborted, commands ignored until end of
transaction block 
I researched this some and it appears to be a  shp2pgsql bug. 
But I am using postgis-utils-2.0.0SVN-1.2.x86_64 
postgis-2.0.0SVN-1.2.x86_64  where this has supposedly been fixed.  Or could
the census data be corrupted? 
So I have "lost" some of the data due to this error. 
I had problems with psql generating ctrl-m instead of \n which would really
mess up the script when it ran. 
So after I generated my load tiger script I ran this command 
tr "\r" "\n" < load_tiger > load_tiger2 

_______________________________________________ 
postgis-users mailing list 
postgis-users at postgis.refractions.net 
http://postgis.refractions.net/mailman/listinfo/postgis-users 




_______________________________________________ 
postgis-users mailing list 
postgis-users at postgis.refractions.net 
http://postgis.refractions.net/mailman/listinfo/postgis-users 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20110413/b4d0d007/attachment.html>


More information about the postgis-users mailing list