[postgis-tickets] [PostGIS] #2260: Benchmarking speed between built-in tiger normalizer and pagc_address_parser
PostGIS
trac at osgeo.org
Wed Apr 24 17:20:27 PDT 2013
#2260: Benchmarking speed between built-in tiger normalizer and
pagc_address_parser
---------------------------------+------------------------------------------
Reporter: robe | Owner: robe
Type: task | Status: new
Priority: medium | Milestone: PostGIS 2.1.0
Component: pagc_address_parser | Version: trunk
Keywords: |
---------------------------------+------------------------------------------
Comment(by woodbri):
OK, New code has been checked to pagc/branches/sew-refactor/postgresql/
* Now has a cache for the standardizer at the query level
* Function signature has changed
* Runs on ming32 pg9.2.3 built with --enable-cassert --enable-debug
* Has one minor but annoying issue on linux that I need to look into
* installs as CREATE EXTENSION address_standardizer;
* psql -U postgres -h localhost -f test1.sql testdb (might have path
issues)
* psql -U postgres -h localhost -f test2.sql testdb (might have path
issues)
Then signature is:
{{{
* The signature for standardize_address follows. The lextab, gaztab and
* rultab should not change once the reference has been standardized and
* the same tables must be used for a geocode request as were used on the
* reference set or the matching will get degregated.
*
* select * from standardize_address(
* lextab text, -- name of table of view
* gaztab text, -- name of table or view
* rultab text, -- name of table of view
* micro text, -- '123 main st'
* macro text); -- 'boston ma 01002'
*
* If you want to standardize a whole table then call it like:
*
* insert into stdaddr (...)
* select (std).* from (
* select standardize_address(
* 'lextab', 'gaztab', 'rultab', micro, marco) as std
* from table_to_standardize) as foo;
*
* The structure of the lextab and gaztab tables of views must be:
*
* seq int4
* word text
* stdword text
* token int4
*
* the rultab table or view must have columns:
*
* rule text
}}}
The problem on linux is that every other time I run the command I get and
error, This is probably because I'm not zero something out when I delete
an item from the cache.
{{{
test1=# select * from standardize_address('lex'::text, 'gaz'::text,
'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
building | house_num | predir | qual | pretype | name | suftype | sufdir
| ruralroute | extra | city | state | country | postcode | box |
unit
----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+----------+---------+----------+-----+------
| 123 | | | | MAIN | STREET |
| | | KANSAS CITY | MISSOURI | | 45678 | |
(1 row)
test1=# select * from standardize_address('lex'::text, 'gaz'::text,
'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
ERROR: AddStdHashEntry: This memory context is already in use!
(0xb89d9d08)
test1=# select * from standardize_address('lex'::text, 'gaz'::text,
'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
building | house_num | predir | qual | pretype | name | suftype | sufdir
| ruralroute | extra | city | state | country | postcode | box |
unit
----------+-----------+--------+------+---------+------+---------+--------+------------+-------+-------------+----------+---------+----------+-----+------
| 123 | | | | MAIN | STREET |
| | | KANSAS CITY | MISSOURI | | 45678 | |
(1 row)
test1=# select * from standardize_address('lex'::text, 'gaz'::text,
'rules'::text, '123 Main Street'::text, 'Kansas City, MO 45678'::text);
ERROR: AddStdHashEntry: This memory context is already in use!
(0xb8e6b4a8)
}}}
It should be easy to find and fix. I also need to run this in songle user
mode under valgrind to see if I have any memory leaks.
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2260#comment:27>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list