[postgis-tickets] [PostGIS] #2260: Benchmarking speed between built-in tiger normalizer and pagc_address_parser

PostGIS trac at osgeo.org
Wed Apr 24 17:20:27 PDT 2013


#2260: Benchmarking speed between built-in tiger normalizer and
pagc_address_parser
---------------------------------+------------------------------------------
 Reporter:  robe                 |       Owner:  robe         
     Type:  task                 |      Status:  new          
 Priority:  medium               |   Milestone:  PostGIS 2.1.0
Component:  pagc_address_parser  |     Version:  trunk        
 Keywords:                       |  
---------------------------------+------------------------------------------

Comment(by woodbri):

 OK, New code has been checked to pagc/branches/sew-refactor/postgresql/

  * Now has a cache for the standardizer at the query level
  * Function signature has changed
  * Runs on ming32 pg9.2.3 built with --enable-cassert --enable-debug
  * Has one minor but annoying issue on linux that I need to look into
  * installs as CREATE EXTENSION address_standardizer;
  * psql -U postgres -h localhost -f test1.sql testdb (might have path
 issues)
  * psql -U postgres -h localhost -f test2.sql testdb (might have path
 issues)

 Then signature is:

 {{{
  * The signature for standardize_address follows. The lextab, gaztab and
  * rultab should not change once the reference has been standardized and
  * the same tables must be used for a geocode request as were used on the
  * reference set or the matching will get degregated.
  *
  *   select * from standardize_address(
  *       lextab text,  -- name of table of view
  *       gaztab text,  -- name of table or view
  *       rultab text,  -- name of table of view
  *       micro text,   -- '123 main st'
  *       macro text);  -- 'boston ma 01002'
  *
  * If you want to standardize a whole table then call it like:
  *
  *   insert into stdaddr (...)
  *       select (std).* from (
  *           select standardize_address(
  *               'lextab', 'gaztab', 'rultab', micro, marco) as std
  *             from table_to_standardize) as foo;
  *
  * The structure of the lextab and gaztab tables of views must be:
  *
  *    seq int4
  *    word text
  *    stdword text
  *    token int4
  *
  * the rultab table or view must have columns:
  *
  *    rule text
 }}}

 The problem on linux is that every other time I run the command I get and
 error, This is probably because I'm not zero something out when I delete
 an item from the cache.

 {{{
 test1=# select * from standardize_address('lex'::text, 'gaz'::text,
 'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
  building | house_num | predir | qual | pretype | name | suftype | sufdir
 | ruralroute | extra |    city     |  state   | country | postcode | box |
 unit
 ----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+----------+---------+----------+-----+------
           | 123       |        |      |         | MAIN | STREET  |
 |            |       | KANSAS CITY | MISSOURI |         | 45678    |     |
 (1 row)

 test1=# select * from standardize_address('lex'::text, 'gaz'::text,
 'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
 ERROR:  AddStdHashEntry: This memory context is already in use!
 (0xb89d9d08)
 test1=# select * from standardize_address('lex'::text, 'gaz'::text,
 'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
  building | house_num | predir | qual | pretype | name | suftype | sufdir
 | ruralroute | extra |    city     |  state   | country | postcode | box |
 unit
 ----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+----------+---------+----------+-----+------
           | 123       |        |      |         | MAIN | STREET  |
 |            |       | KANSAS CITY | MISSOURI |         | 45678    |     |
 (1 row)

 test1=# select * from standardize_address('lex'::text, 'gaz'::text,
 'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
 ERROR:  AddStdHashEntry: This memory context is already in use!
 (0xb8e6b4a8)
 }}}

 It should be easy to find and fix. I also need to run this in songle user
 mode under valgrind to see if I have any memory leaks.

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2260#comment:27>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list