[postgis-devel] Query Performance Question
Stephen Woodbridge
woodbri at swoodbridge.com
Thu Oct 13 13:27:42 PDT 2016
Hi all,
I'm working on improving the performance of the new address standardizer
and I'm trying to figure out what is the "secret sauce" to make the
queries fastest.
My function is declared below for reference. Here are the timing on
various queries:
Query 1:
as_test=# select * from as_standardize(
'123 oak ln, st marie ny usa',
(select grammar from as_config where countrycode='us'),
(select lexicon from as_config where countrycode='us'),
'en_US',
(select filter from as_config where countrycode='us')
);
Time: 593.482 ms
Query 2:
as_test=# select (as_standardize(
'123 oak ln, st marie ny usa',
(select grammar from as_config where countrycode='us'),
(select lexicon from as_config where countrycode='us'),
'en_US',
(select filter from as_config where countrycode='us'))
).*;
Time: 9610.006 ms
The function return 16 columns and 16 * .5935 sec ~= 9.610 sec
Query 3:
as_test=# with cfg as (
select '123 oak ln, st marie ny usa'::text as addr, *
from as_config where countrycode='us'
)
select addr, (as_standardize(addr, grammar, clexicon, 'en_US',
filter)).* from cfg;
Time: 2533.146 ms
The real problem is when I want to standardize a whole table like:
Query 4:
as_test=# with cfg as (
select * from as_config where countrycode='us'
)
select t.id, t.addr, (as_standardize(t.addr, cfg.grammar,
cfg.clexicon, 'en_US', cfg.filter)).*
from tmp_addresses t, cfg
order by t.id;
Time: 46537.589 ms
The tmp_addresses table has 53 records so its taking on average 0.878
sec / record.
The as_config table has 25 records in it, but we only use 1.
I've already implemented a query level cache for the grammar and lexicon
objects which are expensive to construct and that improved performance.
I'm having trouble reconciling the various timings. It is possible that
the sum of the timings for the 53 records really is about 46.5 sec but
that does not explain the differences between Query 1, 2, and 3. The
difference between Query 1 and 2 seems obvious.
Is there a better way to standardize a whole table of addresses? These
issues aside, I'm still looking at ways to improve the performance of
the underlying C++ code.
The code can be found here if your interested:
https://github.com/woodbri/address-standardizer
http://imaptools.com:8080/dl/address-standardizer2/ (windows 64-bit)
Thoughts appreciated,
-Steve
CREATE OR REPLACE FUNCTION as_standardize(
address text,
grammar text,
lexicon text,
locale text,
filter text,
OUT building text,
OUT house_num text,
OUT predir text,
OUT qual text,
OUT pretype text,
OUT name text,
OUT suftype text,
OUT sufdir text,
OUT ruralroute text,
OUT extra text,
OUT city text,
OUT prov text,
OUT country text,
OUT postcode text,
OUT box text,
OUT unit text
)
RETURNS RECORD
AS '$libdir/address_standardizer2-2.0', 'as_standardize'
LANGUAGE 'c' STABLE STRICT;
I have also tried IMMUTABLE instead of STABLE but I get the same timing
results.
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the postgis-devel
mailing list