[postgis-tickets] [PostGIS] #2260: Benchmarking speed between built-in tiger normalizer and pagc_address_parser
PostGIS
trac at osgeo.org
Tue Apr 30 19:20:37 PDT 2013
#2260: Benchmarking speed between built-in tiger normalizer and
pagc_address_parser
----------------------------------+-----------------------------------------
Reporter: robe | Owner: robe
Type: task | Status: reopened
Priority: medium | Milestone: PostGIS 2.1.0
Component: pagc_address_parser | Version: trunk
Resolution: | Keywords:
----------------------------------+-----------------------------------------
Changes (by robe):
* status: closed => reopened
* resolution: fixed =>
Comment:
gosh darn it. Switching it to pure sql did not fix the issue. Still
considers each call a separate query and creates a std object for each.
Normally this would be fine except to completely swap out I have to merge
parse_address and standardize_address into a single query and that seems
like a lot to ask of people.
{{{ -- if I write the equivalent query like this:
WITH p AS (
SELECT parse_address(address) As var_parse_rec
FROM test_parse)
SELECT (to_number(substring( (var_rec).house_num, '[0-9]+'),
'99999999999')
,trim( (var_rec).predir)
, trim((var_rec).name)
,trim(COALESCE((var_rec).suftype, (var_rec).pretype))
, trim((var_rec).sufdir)
, trim((var_rec).unit)
,trim((var_rec).city)
, trim((var_rec).state)
, (var_rec).postcode
, true)::norm_addy
FROM (
SELECT standardize_address('pagc_lex'
, 'pagc_gaz'
, 'pagc_rules'
, COALESCE((var_parse_rec).address1,''),
COALESCE((var_parse_rec).city || ', ','') ||
COALESCE((var_parse_rec).state || ' ', '') ||
COALESCE((var_parse_rec).zip,'') ) As var_rec
FROM p) AS s;
}}}
Takes 46 ms vs.
{{{
CREATE OR REPLACE FUNCTION tiger.pagc_normalize_address2(in_rawinput
character varying)
RETURNS norm_addy AS
$$
SELECT (to_number(substring( (var_rec).house_num, '[0-9]+'),
'99999999999')
,trim( (var_rec).predir)
, trim((var_rec).name)
,trim(COALESCE((var_rec).suftype, (var_rec).pretype))
, trim((var_rec).sufdir)
, trim((var_rec).unit)
,trim((var_rec).city)
, trim((var_rec).state)
, (var_rec).postcode
, true)::norm_addy
FROM (
SELECT standardize_address('pagc_lex'
, 'pagc_gaz'
, 'pagc_rules'
, COALESCE(var_parse_rec.address1,''),
COALESCE(var_parse_rec.city || ', ','') || COALESCE(var_parse_rec.state
|| ' ', '') || COALESCE(var_parse_rec.zip,'') ) As var_rec
FROM parse_address($1) As var_parse_rec ) AS s
$$
LANGUAGE sql
COST 100;
}}}
{{{ -- takes 156 ms (6 times as long)
SELECT address, pagc_normalize_address2(address)
FROM test_parse;
}}}
The main issue , to be able to do a drop in replace, people just pass the
address as a single field, so you I need a version of the standardizer
that does both a parse and standardize in one call so people don't have to
struggle with a complicated query. This might be a good thing to have
anyway as I imagine a lot of people will have the addresss as a single
field.
Then all my function would need to do is define a cast between std and
norm_addy or I'd just write a geocode function that takes std instead of
norm_addy.
I'm reopening this since I think its hard for me to fix without changes to
pagc_address_parser, and will be an issue for others wanting to wrap the
function to wrap it efficiently.
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/2260#comment:36>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list