[postgis-devel] Query Performance Question

Stephen Woodbridge woodbri at swoodbridge.com
Sun Oct 23 15:08:48 PDT 2016


OK, for the record, I modified the queries to look like the following 
and was able to standardize a table of 226 records in about 19 sec or 88 
ms per record on average.

-- before PostgreSQL 9.3 use a query like this
with tmp as (select a.id, address, as_standardize(
                  address,
                  grammar,
                  clexicon,
                  'en_US',
                  filter) as std
              from as_config cfg, test_addresses a
             where cfg.countrycode='us')
select id, (std).*, address
   from tmp
  order by id;

-- PostgreSQL 9.3+ use a query like this
select a.id, std.*, address
   from test_addresses as a,
        as_config cfg,
        LATERAL as_standardize(
             address,
             grammar,
             clexicon,
             'en_US',
             filter
         ) as std
  where cfg.countrycode='us'
  order by a.id;

So this is very close to the performance I get running against a command 
line tester version of the address standardizer reading the grammar and 
lexicon from files.

-Steve

On 10/13/2016 4:27 PM, Stephen Woodbridge wrote:
> Hi all,
>
> I'm working on improving the performance of the new address standardizer
> and I'm trying to figure out what is the "secret sauce" to make the
> queries fastest.
>
> My function is declared below for reference. Here are the timing on
> various queries:
>
> Query 1:
> as_test=# select * from as_standardize(
>     '123 oak ln, st marie ny usa',
>     (select grammar from as_config where countrycode='us'),
>     (select lexicon from as_config where countrycode='us'),
>     'en_US',
>     (select filter from as_config where countrycode='us')
> );
> Time: 593.482 ms
>
> Query 2:
> as_test=# select (as_standardize(
>     '123 oak ln, st marie ny usa',
>     (select grammar from as_config where countrycode='us'),
>     (select lexicon from as_config where countrycode='us'),
>     'en_US',
>     (select filter from as_config where countrycode='us'))
> ).*;
> Time: 9610.006 ms
>
> The function return 16 columns and 16 * .5935 sec ~= 9.610 sec
>
> Query 3:
> as_test=# with cfg as (
>    select '123 oak ln, st marie ny usa'::text as addr, *
>      from as_config where countrycode='us'
> )
> select addr, (as_standardize(addr, grammar, clexicon, 'en_US',
> filter)).* from cfg;
> Time: 2533.146 ms
>
> The real problem is when I want to standardize a whole table like:
>
> Query 4:
> as_test=# with cfg as (
>    select * from as_config where countrycode='us'
> )
> select t.id, t.addr, (as_standardize(t.addr, cfg.grammar,
>                                 cfg.clexicon, 'en_US', cfg.filter)).*
>   from tmp_addresses t, cfg
>   order by t.id;
> Time: 46537.589 ms
>
> The tmp_addresses table has 53 records so its taking on average 0.878
> sec / record.
> The as_config table has 25 records in it, but we only use 1.
>
> I've already implemented a query level cache for the grammar and lexicon
> objects which are expensive to construct and that improved performance.
>
> I'm having trouble reconciling the various timings. It is possible that
> the sum of the timings for the 53 records really is about 46.5 sec but
> that does not explain the differences between Query 1, 2, and 3. The
> difference between Query 1 and 2 seems obvious.
>
> Is there a better way to standardize a whole table of addresses? These
> issues aside, I'm still looking at ways to improve the performance of
> the underlying C++ code.
>
> The code can be found here if your interested:
> https://github.com/woodbri/address-standardizer
> http://imaptools.com:8080/dl/address-standardizer2/ (windows 64-bit)
>
> Thoughts appreciated,
>   -Steve
>
> CREATE OR REPLACE FUNCTION as_standardize(
>         address text,
>         grammar text,
>         lexicon text,
>         locale text,
>         filter text,
>         OUT building text,
>         OUT house_num text,
>         OUT predir text,
>         OUT qual text,
>         OUT pretype text,
>         OUT name text,
>         OUT suftype text,
>         OUT sufdir text,
>         OUT ruralroute text,
>         OUT extra text,
>         OUT city text,
>         OUT prov text,
>         OUT country text,
>         OUT postcode text,
>         OUT box text,
>         OUT unit text
>         )
>     RETURNS RECORD
>     AS '$libdir/address_standardizer2-2.0', 'as_standardize'
>     LANGUAGE 'c' STABLE STRICT;
>
> I have also tried IMMUTABLE instead of STABLE but I get the same timing
> results.
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus




More information about the postgis-devel mailing list