[postgis-devel] postgis_tiger geocoding extension : France & temporal geocoding

Stephen Woodbridge woodbri at swoodbridge.com
Thu Jun 16 06:30:35 PDT 2016


Also, I forgot to mention that Reginia worked on windows builds for this 
and I maintain Windows development builds here:

http://imaptools.com:8080/dl/address-standardizer2/

-Steve

On 6/16/2016 9:15 AM, Stephen Woodbridge wrote:
> Hi Remi,
>
> I have started a project to create an address standardizer that can be
> internationalized with the longer term goal of redesigning the postgis
> geocoder to work with data from any data sources.
>
> https://github.com/woodbri/address-standardizer
>
> The design for a generic geocoder works like this:
>
> * load street data into a table with geometry
> * standardize the street data into a standardized table
> * given a query address
>   * standardize it
>   * generate a query to search for candidate records in the standized table
>   * score/filter the candidate records
>   * fetch the geometry and compute the location
>
> The address-standardizer needs some performance tuning as it has
> problems when the lexicon gets large and/or when the grammar gets too
> complex.
>
> I'm still fairly new at C++, and would be happy to get pull requests or
> a code review.
>
> I have a sample lexicon and grammar for France and it should be straight
> forward to add a dimension like date to the scheme above and filter
> candidate records based on dates in the scoring process.
>
> I would be happy to work with you on this and discuss how to best solve
> these problems.
>
> Thanks,
>   -Steve
>
> On 6/16/2016 6:10 AM, Rémi Cura wrote:
>> Hey dear list,
>> I'm currently considering extending postgis geocoder.
>>
>> I have two needs:
>>  - to work with French data
>>  - to perform temporal geocoding (you give an addresses *and *a date,
>> you get geographic position).
>>
>> For the moment I'm trying to understand how the geocoder works with USA
>> data.
>>
>> I imported Massachusetts for a test, and everything seems okay except
>> that my tiger.state table is empty, which prevents geocoding.
>>
>> Notwithstanding the import step, and the adress normaliser, it seems to
>> boil down to a large query in the function tiger.geocode_address().
>>
>>
>> I would be delighted to have some pointers on the design of the
>> postgis_tiger_geocoder extension.
>> In perticular, I don't understand the adress standardizer business at
>> all (yeah, I found some doc, no, I don't understand it well)
>>
>>
>> Many thanks for help,
>> Cheers,
>> Rémi-C
>>
>>
>>
>>
>>
>>
>> -------------------------------------
>> WITH a AS
>> (
>>          SELECT   *
>>          FROM     (
>>                              SELECT     f.*,
>>                                         ad.side,
>>                                         ad.zip,
>>                                         ad.fromhn,
>>                                         ad.tohn,
>>                                         Rank() over(ORDER BY
>> Diff_zip(ad.zip,'01104')*2.00 +
>>                                         CASE
>>                                                    WHEN Lower(f.name
>> <http://f.name>) = Lower('FRANKLIN') THEN 0
>>                                                    ELSE
>> Levenshtein_ignore_case(f.name <http://f.name>, Lower('FRANKLIN') )
>>                                         END +
>> Levenshtein_ignore_case(f.fullname, Lower('FRANKLIN'
>>                                                    || ' '
>>                                                    ||
>> Coalesce('ST','')) ) +
>>                                         CASE
>>                                                    WHEN (
>>
>> Greatest_hn(ad.fromhn,ad.tohn) % 2)::INTEGER = ('1' % 2)::INTEGER THEN 0
>>                                                    ELSE 1
>>                                         END +
>>                                         CASE
>>                                                    WHEN '1'::INTEGER
>> BETWEEN Least_hn(ad.fromhn,ad.tohn) AND        Greatest_hn(ad.fromhn,
>> ad.tohn) THEN 0
>>                                                    ELSE 4
>>                                         END +
>>                                         CASE
>>                                                    WHEN Lower('ST') =
>> Lower(f.suftypabrv)
>>                                                    OR
>> Lower('ST') = Lower(f.pretypabrv) THEN 0
>>                                                    ELSE 1
>>                                         END + Rate_attributes(NULL,
>> f.predirabrv, 'FRANKLIN', f.name <http://f.name> , 'ST', suftypabrv ,
>> NULL, sufdirabrv, prequalabr) ) AS rank
>>                              FROM
>> tiger.featnames
>> AS f
>>                              inner join
>> tiger.addr
>> AS ad
>>                              ON         (
>>                                                    f.tlid = ad.tlid)
>>                              WHERE      '25' = f.statefp
>>                              AND        '25' = ad.statefp
>>                              AND        (
>>                                                    Lower(f.fullname)
>> LIKE (Coalesce(NULL
>>                                                               || ' ','')
>>                                                               ||
>> Lower('FRANKLIN')
>>                                                               ||
>> '%')::text
>>                                         OR         Lower(f.name
>> <http://f.name>) = Lower('FRANKLIN')
>>                                         OR         Soundex(f.name
>> <http://f.name>) = Soundex('FRANKLIN') )
>>                              AND        (
>>                                                    ad.zip =
>> ANY('{01102,01103,01104,01105,01106}'::VARCHAR[]) ) ) AS foo
>>          ORDER BY rank limit 30 )
>>
>> , b AS (
>>                   SELECT     a.tlid,
>>                      a.fullname,
>>                      a.name <http://a.name>,
>>                      a.predirabrv,
>>                      a.suftypabrv,
>>                      a.sufdirabrv,
>>                      a.prequalabr,
>>                      a.pretypabrv,
>>                      b.the_geom,
>>                      tfidr,
>>                      tfidl,
>>                      a.side ,
>>                      a.fromhn,
>>                      a.tohn,
>>                      a.zip,
>>                      p.name <http://p.name> AS place
>>                   FROM       a
>>                   inner join tiger.edges AS b
>>                   ON         (
>>                             a.statefp = b.statefp
>>                      AND        a.tlid = b.tlid )
>>                   inner join tiger.faces AS f
>>                   ON         (
>>                             '25' = f.statefp
>>                      AND        ( (
>>                                       b.tfidl = f.tfid
>>                                    AND        a.side = 'L')
>>                             OR         (
>>                                       b.tfidr = f.tfid
>>                                    AND        a.side = 'R' ) ))
>>                   inner join tiger.place p
>>                   ON         (
>>                             '25' = p.statefp
>>                      AND        f.placefp = p.placefp )
>>                   WHERE      a.statefp = '25'
>>                   AND        b.statefp = '25' )
>>
>> , sub AS  (
>>     SELECT   tlid,
>>            predirabrv,
>>            coalesce(b.prequalabr
>>                 || ' ','' )
>>                 || b.name <http://b.name> AS fename,
>>            suftypabrv,
>>            sufdirabrv,
>>            fromhn,
>>            tohn,
>>            side,
>>            zip,
>>            rate_attributes(NULL, predirabrv, 'FRANKLIN', b.name
>> <http://b.name> , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) +
>>            CASE
>>                 WHEN '1'::INTEGER IS NULL
>>                 OR       b.fromhn IS NULL THEN 20
>>                 WHEN '1'::    INTEGER >= least_hn(b.fromhn, b.tohn)
>>                 AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>>                 AND      (
>>                           '1'::INTEGER % 2) =
>> (to_number(b.fromhn,'99999999') % 2)::INTEGER THEN 0
>>                 WHEN '1'::    INTEGER >= least_hn(b.fromhn,b.tohn)
>>                 AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>> THEN 2
>>                 ELSE ((1.0 -
>> (least_hn('1'::text,least_hn(b.fromhn,b.tohn)::text)::NUMERIC /
>> (greatest(1,greatest_hn('1'::text,greatest_hn(b.fromhn,b.tohn)::text)))
>> )) * 5)::INTEGER + 5
>>            END::INTEGER AS sub_rating,
>>            '1'::INTEGER >= least_hn(b.fromhn,b.tohn)
>>       AND      '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>>       AND      (
>>                 '1' % 2)::NUMERIC::INTEGER =
>> (to_number(b.fromhn,'99999999') % 2) AS exact_address,
>>            b.name <http://b.name>,
>>            b.prequalabr,
>>            b.pretypabrv,
>>            b.tfidr,
>>            b.tfidl,
>>            b.the_geom,
>>            b.place
>>       FROM     b
>>       ORDER BY 10 ,
>>            11 DESC limit 20 )
>>
>>
>>
>> --, foo AS (
>>
>>     SELECT DISTINCT
>>          ON (
>>
>>  sub.predirabrv,sub.fename,coalesce(sub.suftypabrv, sub.pretypabrv)
>> ,sub.sufdirabrv,sub.place,s.stusps,sub.zip) sub.predirabrv AS fedirp,
>>                  sub.fename,
>>                  coalesce(sub.suftypabrv, sub.pretypabrv) AS fetype,
>>                  sub.sufdirabrv                           AS fedirs,
>>                  sub.place ,
>>
>>  s.stusps
>> AS state,
>>
>>  sub.zip
>> AS zip,
>>                  interpolate_from_address('1', sub.fromhn, sub.tohn,
>> sub.the_geom,
>> sub.side)
>> AS address_geom,
>>                  (sub.sub_rating + least(coalesce(diff_zip('01104' ,
>> sub.zip),0), 20)*2.00 + coalesce(levenshtein_ignore_case('SPRINGFIELD',
>> sub.place),5) )::INTEGER AS sub_rating,
>>
>>  sub.exact_address
>> AS exact_address,
>>                  sub.tohn,
>>                  sub.fromhn
>>          FROM           sub
>>          join            tiger.state s
>>          ON              (
>>                          '25' = s.statefp)
>>          ORDER BY        1,
>>                  2,
>>                  3,
>>                  4,
>>                  5,
>>                  6,
>>                  7,
>>                  9 limit 20)
>>
>> SELECT   *
>> FROM foo
>> ORDER BY sub_rating,
>>          exact_address DESC limit 100
>> -------------------------------------
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus




More information about the postgis-devel mailing list