[postgis-devel] postgis_tiger geocoding extension : France & temporal geocoding
Stephen Woodbridge
woodbri at swoodbridge.com
Thu Jun 16 06:15:41 PDT 2016
Hi Remi,
I have started a project to create an address standardizer that can be
internationalized with the longer term goal of redesigning the postgis
geocoder to work with data from any data sources.
https://github.com/woodbri/address-standardizer
The design for a generic geocoder works like this:
* load street data into a table with geometry
* standardize the street data into a standardized table
* given a query address
* standardize it
* generate a query to search for candidate records in the standized table
* score/filter the candidate records
* fetch the geometry and compute the location
The address-standardizer needs some performance tuning as it has
problems when the lexicon gets large and/or when the grammar gets too
complex.
I'm still fairly new at C++, and would be happy to get pull requests or
a code review.
I have a sample lexicon and grammar for France and it should be straight
forward to add a dimension like date to the scheme above and filter
candidate records based on dates in the scoring process.
I would be happy to work with you on this and discuss how to best solve
these problems.
Thanks,
-Steve
On 6/16/2016 6:10 AM, Rémi Cura wrote:
> Hey dear list,
> I'm currently considering extending postgis geocoder.
>
> I have two needs:
> - to work with French data
> - to perform temporal geocoding (you give an addresses *and *a date,
> you get geographic position).
>
> For the moment I'm trying to understand how the geocoder works with USA
> data.
>
> I imported Massachusetts for a test, and everything seems okay except
> that my tiger.state table is empty, which prevents geocoding.
>
> Notwithstanding the import step, and the adress normaliser, it seems to
> boil down to a large query in the function tiger.geocode_address().
>
>
> I would be delighted to have some pointers on the design of the
> postgis_tiger_geocoder extension.
> In perticular, I don't understand the adress standardizer business at
> all (yeah, I found some doc, no, I don't understand it well)
>
>
> Many thanks for help,
> Cheers,
> Rémi-C
>
>
>
>
>
>
> -------------------------------------
> WITH a AS
> (
> SELECT *
> FROM (
> SELECT f.*,
> ad.side,
> ad.zip,
> ad.fromhn,
> ad.tohn,
> Rank() over(ORDER BY
> Diff_zip(ad.zip,'01104')*2.00 +
> CASE
> WHEN Lower(f.name
> <http://f.name>) = Lower('FRANKLIN') THEN 0
> ELSE
> Levenshtein_ignore_case(f.name <http://f.name>, Lower('FRANKLIN') )
> END +
> Levenshtein_ignore_case(f.fullname, Lower('FRANKLIN'
> || ' '
> || Coalesce('ST','')) ) +
> CASE
> WHEN (
>
> Greatest_hn(ad.fromhn,ad.tohn) % 2)::INTEGER = ('1' % 2)::INTEGER THEN 0
> ELSE 1
> END +
> CASE
> WHEN '1'::INTEGER
> BETWEEN Least_hn(ad.fromhn,ad.tohn) AND Greatest_hn(ad.fromhn,
> ad.tohn) THEN 0
> ELSE 4
> END +
> CASE
> WHEN Lower('ST') =
> Lower(f.suftypabrv)
> OR
> Lower('ST') = Lower(f.pretypabrv) THEN 0
> ELSE 1
> END + Rate_attributes(NULL,
> f.predirabrv, 'FRANKLIN', f.name <http://f.name> , 'ST', suftypabrv ,
> NULL, sufdirabrv, prequalabr) ) AS rank
> FROM
> tiger.featnames
> AS f
> inner join
> tiger.addr
> AS ad
> ON (
> f.tlid = ad.tlid)
> WHERE '25' = f.statefp
> AND '25' = ad.statefp
> AND (
> Lower(f.fullname)
> LIKE (Coalesce(NULL
> || ' ','')
> ||
> Lower('FRANKLIN')
> || '%')::text
> OR Lower(f.name
> <http://f.name>) = Lower('FRANKLIN')
> OR Soundex(f.name
> <http://f.name>) = Soundex('FRANKLIN') )
> AND (
> ad.zip =
> ANY('{01102,01103,01104,01105,01106}'::VARCHAR[]) ) ) AS foo
> ORDER BY rank limit 30 )
>
> , b AS (
> SELECT a.tlid,
> a.fullname,
> a.name <http://a.name>,
> a.predirabrv,
> a.suftypabrv,
> a.sufdirabrv,
> a.prequalabr,
> a.pretypabrv,
> b.the_geom,
> tfidr,
> tfidl,
> a.side ,
> a.fromhn,
> a.tohn,
> a.zip,
> p.name <http://p.name> AS place
> FROM a
> inner join tiger.edges AS b
> ON (
> a.statefp = b.statefp
> AND a.tlid = b.tlid )
> inner join tiger.faces AS f
> ON (
> '25' = f.statefp
> AND ( (
> b.tfidl = f.tfid
> AND a.side = 'L')
> OR (
> b.tfidr = f.tfid
> AND a.side = 'R' ) ))
> inner join tiger.place p
> ON (
> '25' = p.statefp
> AND f.placefp = p.placefp )
> WHERE a.statefp = '25'
> AND b.statefp = '25' )
>
> , sub AS (
> SELECT tlid,
> predirabrv,
> coalesce(b.prequalabr
> || ' ','' )
> || b.name <http://b.name> AS fename,
> suftypabrv,
> sufdirabrv,
> fromhn,
> tohn,
> side,
> zip,
> rate_attributes(NULL, predirabrv, 'FRANKLIN', b.name
> <http://b.name> , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) +
> CASE
> WHEN '1'::INTEGER IS NULL
> OR b.fromhn IS NULL THEN 20
> WHEN '1':: INTEGER >= least_hn(b.fromhn, b.tohn)
> AND '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
> AND (
> '1'::INTEGER % 2) =
> (to_number(b.fromhn,'99999999') % 2)::INTEGER THEN 0
> WHEN '1':: INTEGER >= least_hn(b.fromhn,b.tohn)
> AND '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn) THEN 2
> ELSE ((1.0 -
> (least_hn('1'::text,least_hn(b.fromhn,b.tohn)::text)::NUMERIC /
> (greatest(1,greatest_hn('1'::text,greatest_hn(b.fromhn,b.tohn)::text)))
> )) * 5)::INTEGER + 5
> END::INTEGER AS sub_rating,
> '1'::INTEGER >= least_hn(b.fromhn,b.tohn)
> AND '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
> AND (
> '1' % 2)::NUMERIC::INTEGER =
> (to_number(b.fromhn,'99999999') % 2) AS exact_address,
> b.name <http://b.name>,
> b.prequalabr,
> b.pretypabrv,
> b.tfidr,
> b.tfidl,
> b.the_geom,
> b.place
> FROM b
> ORDER BY 10 ,
> 11 DESC limit 20 )
>
>
>
> --, foo AS (
>
> SELECT DISTINCT
> ON (
>
> sub.predirabrv,sub.fename,coalesce(sub.suftypabrv, sub.pretypabrv)
> ,sub.sufdirabrv,sub.place,s.stusps,sub.zip) sub.predirabrv AS fedirp,
> sub.fename,
> coalesce(sub.suftypabrv, sub.pretypabrv) AS fetype,
> sub.sufdirabrv AS fedirs,
> sub.place ,
>
> s.stusps
> AS state,
>
> sub.zip
> AS zip,
> interpolate_from_address('1', sub.fromhn, sub.tohn,
> sub.the_geom,
> sub.side)
> AS address_geom,
> (sub.sub_rating + least(coalesce(diff_zip('01104' ,
> sub.zip),0), 20)*2.00 + coalesce(levenshtein_ignore_case('SPRINGFIELD',
> sub.place),5) )::INTEGER AS sub_rating,
>
> sub.exact_address
> AS exact_address,
> sub.tohn,
> sub.fromhn
> FROM sub
> join tiger.state s
> ON (
> '25' = s.statefp)
> ORDER BY 1,
> 2,
> 3,
> 4,
> 5,
> 6,
> 7,
> 9 limit 20)
>
> SELECT *
> FROM foo
> ORDER BY sub_rating,
> exact_address DESC limit 100
> -------------------------------------
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the postgis-devel
mailing list