[postgis-users] Geocode function fails when callingnormalize_address

Paragon Corporation lr at pcorp.us
Thu Sep 6 17:13:45 PDT 2012

Regarding your normalize issue, it's probably one record causing it.  I
thought we fixed some of these issues in 2.0.1 and 2.1.0SVN so you might
want to upgrade your script to the 2.0.1 -- there is an upgrade script you
can use packaged in the tar ball.
If that still doesn't fix your issue, look for variables often found in
regex expressions in your address (things like (, )  and . for example might
be throwing it off and we might not be escaping right.  Hard to tell which
one until your narrow down to the one bad record.
You might have to run each update as a single to see which one fails
Couple of other tips
1) you don't want to geocode 100K records all at once since it has to
complete as a single transaction.  You'll need to do it in batches.  We use
pgScript for that batch processing so we can just run in a pgAdmin window.
though any scripting tool of your choice would do.
2) Don't do (geocode(address)).*   That often calls geocode for each field
in output so would slow down your processing n fold (e.g. you'd have like 5
calls instead of 1)
You really want to do geocode(address) As geo
(geo).geomout etc. 
See example in docs --
http://www.postgis.org/documentation/manual-svn/Geocode.html  -- we do a
left join to guarantee that even if no records returned for an address, we
can stamp the address as attempted with a -1 rating
I think we thought about this after we wrote the book :(
On bright side, we just signed our contract for Second Edition of PostGIS in
Action and have started writing it :)  more on that later.  that will cover
newer enhancements in geocoder, raster, topology, PostGIS 2.0-2.1 (basically
at least 2.0 and 2.1 changes)  and PostgreSQL 9.1-9.3 
We'll be posting the new Table Of Contents soon here, and let people know
when they can start purchasing
Those who buy the new book, is our understanding, will automatically get
electronic copies of the first edition, so even if you buy early, you'll get
our drafts as we write them and also have the older book to work with.
3) The book was written before we introduced the limit, so again look at:
http://www.postgis.org/documentation/manual-svn/Geocode.html  (feature is
available in 2.0 as well)
Use: geocode(address,1)
for faster performance and then you don't even need the DISTINCT ON since
you'll get back at most one answer and the best one.
Hope that helps,
Leo and Regina


From: postgis-users-bounces at postgis.refractions.net
[mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
Robert_Clift at doh.state.fl.us
Sent: Thursday, September 06, 2012 1:01 PM
To: postgis-users at postgis.refractions.net
Subject: [postgis-users] Geocode function fails when

Hi All: 

My batch geocode of 100K records is failing with error messages that refer
to an operand in the normalize_address function. I based my single-column
UPDATE on the multi-column example in "PostGIS In Action" and have tweaked
the syntax only to achieve slight variation in the error message.

The query is something like: 

UPDATE respondents_addr SET the_geom = 
   FROM (SELECT DISTINCT ON (respondent, s_year) respondent, s_year,
      FROM respondents_addr As ra 
         WHERE ra.address IS NOT NULL 
      ORDER BY respondent, s_year, rating) As g 
      WHERE g.respondent = respondents_addr.respondent 
      AND g.s_year = respondents_addr.s_year; 

The most frequent error text is: 

ERROR:  invalid regular expression: quantifier operand invalid 
CONTEXT:  PL/pgSQL function "normalize_address" line 386 at assignment 
PL/pgSQL function "geocode" line 10 at assignment 
********** Error ********** 
ERROR: invalid regular expression: quantifier operand invalid 
SQL state: 2201B 
Context: PL/pgSQL function "normalize_address" line 386 at assignment 
PL/pgSQL function "geocode" line 10 at assignment 

Also the query runs for more than an hour, (xp box with 3.5 GB ram,
PosgreSQL 9.0.6, PostGIS 2.0.0 (might be the problem?)) and no matter how
the error message changes it always references the normalization function,
so I tried:

SELECT (normalize_address(address)).* FROM respondents_addr WHERE address IS

With the LIMIT it ran quickly and returned the expected result, so I tried: 

SELECT (normalize_address(address)).* FROM respondents_addr WHERE address IS

Bumping the LIMIT up reproduced the error from the geocode attempts. I'd say
"aha" if only I knew why. 
I checked the list archives and googled several combinations of words from
the error but alas, to no avail. I suspect that I've either repeated a
simple syntax error or that my setup is somehow off -- any help will be
appreciatedand I might as well close with results of PostGIS_full_version():

POSTGIS="2.0.0 r9605" GEOS="3.3.3-CAPI-1.7.4" PROJ="Rel. 4.8.0, 6 March
2012" GDAL="GDAL 1.9.0, released 2011/12/29" LIBXML="2.7.8"

Thank you, 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20120906/47b42756/attachment.html>

More information about the postgis-users mailing list