[postgis-devel] Query Performance Question
Stephen Woodbridge
woodbri at swoodbridge.com
Sun Oct 23 15:08:48 PDT 2016
OK, for the record, I modified the queries to look like the following
and was able to standardize a table of 226 records in about 19 sec or 88
ms per record on average.
-- before PostgreSQL 9.3 use a query like this
with tmp as (select a.id, address, as_standardize(
address,
grammar,
clexicon,
'en_US',
filter) as std
from as_config cfg, test_addresses a
where cfg.countrycode='us')
select id, (std).*, address
from tmp
order by id;
-- PostgreSQL 9.3+ use a query like this
select a.id, std.*, address
from test_addresses as a,
as_config cfg,
LATERAL as_standardize(
address,
grammar,
clexicon,
'en_US',
filter
) as std
where cfg.countrycode='us'
order by a.id;
So this is very close to the performance I get running against a command
line tester version of the address standardizer reading the grammar and
lexicon from files.
-Steve
On 10/13/2016 4:27 PM, Stephen Woodbridge wrote:
> Hi all,
>
> I'm working on improving the performance of the new address standardizer
> and I'm trying to figure out what is the "secret sauce" to make the
> queries fastest.
>
> My function is declared below for reference. Here are the timing on
> various queries:
>
> Query 1:
> as_test=# select * from as_standardize(
> '123 oak ln, st marie ny usa',
> (select grammar from as_config where countrycode='us'),
> (select lexicon from as_config where countrycode='us'),
> 'en_US',
> (select filter from as_config where countrycode='us')
> );
> Time: 593.482 ms
>
> Query 2:
> as_test=# select (as_standardize(
> '123 oak ln, st marie ny usa',
> (select grammar from as_config where countrycode='us'),
> (select lexicon from as_config where countrycode='us'),
> 'en_US',
> (select filter from as_config where countrycode='us'))
> ).*;
> Time: 9610.006 ms
>
> The function return 16 columns and 16 * .5935 sec ~= 9.610 sec
>
> Query 3:
> as_test=# with cfg as (
> select '123 oak ln, st marie ny usa'::text as addr, *
> from as_config where countrycode='us'
> )
> select addr, (as_standardize(addr, grammar, clexicon, 'en_US',
> filter)).* from cfg;
> Time: 2533.146 ms
>
> The real problem is when I want to standardize a whole table like:
>
> Query 4:
> as_test=# with cfg as (
> select * from as_config where countrycode='us'
> )
> select t.id, t.addr, (as_standardize(t.addr, cfg.grammar,
> cfg.clexicon, 'en_US', cfg.filter)).*
> from tmp_addresses t, cfg
> order by t.id;
> Time: 46537.589 ms
>
> The tmp_addresses table has 53 records so its taking on average 0.878
> sec / record.
> The as_config table has 25 records in it, but we only use 1.
>
> I've already implemented a query level cache for the grammar and lexicon
> objects which are expensive to construct and that improved performance.
>
> I'm having trouble reconciling the various timings. It is possible that
> the sum of the timings for the 53 records really is about 46.5 sec but
> that does not explain the differences between Query 1, 2, and 3. The
> difference between Query 1 and 2 seems obvious.
>
> Is there a better way to standardize a whole table of addresses? These
> issues aside, I'm still looking at ways to improve the performance of
> the underlying C++ code.
>
> The code can be found here if your interested:
> https://github.com/woodbri/address-standardizer
> http://imaptools.com:8080/dl/address-standardizer2/ (windows 64-bit)
>
> Thoughts appreciated,
> -Steve
>
> CREATE OR REPLACE FUNCTION as_standardize(
> address text,
> grammar text,
> lexicon text,
> locale text,
> filter text,
> OUT building text,
> OUT house_num text,
> OUT predir text,
> OUT qual text,
> OUT pretype text,
> OUT name text,
> OUT suftype text,
> OUT sufdir text,
> OUT ruralroute text,
> OUT extra text,
> OUT city text,
> OUT prov text,
> OUT country text,
> OUT postcode text,
> OUT box text,
> OUT unit text
> )
> RETURNS RECORD
> AS '$libdir/address_standardizer2-2.0', 'as_standardize'
> LANGUAGE 'c' STABLE STRICT;
>
> I have also tried IMMUTABLE instead of STABLE but I get the same timing
> results.
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the postgis-devel
mailing list