[postgis-devel] postgis_tiger geocoding extension : France & temporal geocoding
Stephen Woodbridge
woodbri at swoodbridge.com
Thu Jun 16 06:30:35 PDT 2016
Also, I forgot to mention that Reginia worked on windows builds for this
and I maintain Windows development builds here:
http://imaptools.com:8080/dl/address-standardizer2/
-Steve
On 6/16/2016 9:15 AM, Stephen Woodbridge wrote:
> Hi Remi,
>
> I have started a project to create an address standardizer that can be
> internationalized with the longer term goal of redesigning the postgis
> geocoder to work with data from any data sources.
>
> https://github.com/woodbri/address-standardizer
>
> The design for a generic geocoder works like this:
>
> * load street data into a table with geometry
> * standardize the street data into a standardized table
> * given a query address
> * standardize it
> * generate a query to search for candidate records in the standized table
> * score/filter the candidate records
> * fetch the geometry and compute the location
>
> The address-standardizer needs some performance tuning as it has
> problems when the lexicon gets large and/or when the grammar gets too
> complex.
>
> I'm still fairly new at C++, and would be happy to get pull requests or
> a code review.
>
> I have a sample lexicon and grammar for France and it should be straight
> forward to add a dimension like date to the scheme above and filter
> candidate records based on dates in the scoring process.
>
> I would be happy to work with you on this and discuss how to best solve
> these problems.
>
> Thanks,
> -Steve
>
> On 6/16/2016 6:10 AM, Rémi Cura wrote:
>> Hey dear list,
>> I'm currently considering extending postgis geocoder.
>>
>> I have two needs:
>> - to work with French data
>> - to perform temporal geocoding (you give an addresses *and *a date,
>> you get geographic position).
>>
>> For the moment I'm trying to understand how the geocoder works with USA
>> data.
>>
>> I imported Massachusetts for a test, and everything seems okay except
>> that my tiger.state table is empty, which prevents geocoding.
>>
>> Notwithstanding the import step, and the adress normaliser, it seems to
>> boil down to a large query in the function tiger.geocode_address().
>>
>>
>> I would be delighted to have some pointers on the design of the
>> postgis_tiger_geocoder extension.
>> In perticular, I don't understand the adress standardizer business at
>> all (yeah, I found some doc, no, I don't understand it well)
>>
>>
>> Many thanks for help,
>> Cheers,
>> Rémi-C
>>
>>
>>
>>
>>
>>
>> -------------------------------------
>> WITH a AS
>> (
>> SELECT *
>> FROM (
>> SELECT f.*,
>> ad.side,
>> ad.zip,
>> ad.fromhn,
>> ad.tohn,
>> Rank() over(ORDER BY
>> Diff_zip(ad.zip,'01104')*2.00 +
>> CASE
>> WHEN Lower(f.name
>> <http://f.name>) = Lower('FRANKLIN') THEN 0
>> ELSE
>> Levenshtein_ignore_case(f.name <http://f.name>, Lower('FRANKLIN') )
>> END +
>> Levenshtein_ignore_case(f.fullname, Lower('FRANKLIN'
>> || ' '
>> ||
>> Coalesce('ST','')) ) +
>> CASE
>> WHEN (
>>
>> Greatest_hn(ad.fromhn,ad.tohn) % 2)::INTEGER = ('1' % 2)::INTEGER THEN 0
>> ELSE 1
>> END +
>> CASE
>> WHEN '1'::INTEGER
>> BETWEEN Least_hn(ad.fromhn,ad.tohn) AND Greatest_hn(ad.fromhn,
>> ad.tohn) THEN 0
>> ELSE 4
>> END +
>> CASE
>> WHEN Lower('ST') =
>> Lower(f.suftypabrv)
>> OR
>> Lower('ST') = Lower(f.pretypabrv) THEN 0
>> ELSE 1
>> END + Rate_attributes(NULL,
>> f.predirabrv, 'FRANKLIN', f.name <http://f.name> , 'ST', suftypabrv ,
>> NULL, sufdirabrv, prequalabr) ) AS rank
>> FROM
>> tiger.featnames
>> AS f
>> inner join
>> tiger.addr
>> AS ad
>> ON (
>> f.tlid = ad.tlid)
>> WHERE '25' = f.statefp
>> AND '25' = ad.statefp
>> AND (
>> Lower(f.fullname)
>> LIKE (Coalesce(NULL
>> || ' ','')
>> ||
>> Lower('FRANKLIN')
>> ||
>> '%')::text
>> OR Lower(f.name
>> <http://f.name>) = Lower('FRANKLIN')
>> OR Soundex(f.name
>> <http://f.name>) = Soundex('FRANKLIN') )
>> AND (
>> ad.zip =
>> ANY('{01102,01103,01104,01105,01106}'::VARCHAR[]) ) ) AS foo
>> ORDER BY rank limit 30 )
>>
>> , b AS (
>> SELECT a.tlid,
>> a.fullname,
>> a.name <http://a.name>,
>> a.predirabrv,
>> a.suftypabrv,
>> a.sufdirabrv,
>> a.prequalabr,
>> a.pretypabrv,
>> b.the_geom,
>> tfidr,
>> tfidl,
>> a.side ,
>> a.fromhn,
>> a.tohn,
>> a.zip,
>> p.name <http://p.name> AS place
>> FROM a
>> inner join tiger.edges AS b
>> ON (
>> a.statefp = b.statefp
>> AND a.tlid = b.tlid )
>> inner join tiger.faces AS f
>> ON (
>> '25' = f.statefp
>> AND ( (
>> b.tfidl = f.tfid
>> AND a.side = 'L')
>> OR (
>> b.tfidr = f.tfid
>> AND a.side = 'R' ) ))
>> inner join tiger.place p
>> ON (
>> '25' = p.statefp
>> AND f.placefp = p.placefp )
>> WHERE a.statefp = '25'
>> AND b.statefp = '25' )
>>
>> , sub AS (
>> SELECT tlid,
>> predirabrv,
>> coalesce(b.prequalabr
>> || ' ','' )
>> || b.name <http://b.name> AS fename,
>> suftypabrv,
>> sufdirabrv,
>> fromhn,
>> tohn,
>> side,
>> zip,
>> rate_attributes(NULL, predirabrv, 'FRANKLIN', b.name
>> <http://b.name> , 'ST', suftypabrv , NULL, sufdirabrv, prequalabr) +
>> CASE
>> WHEN '1'::INTEGER IS NULL
>> OR b.fromhn IS NULL THEN 20
>> WHEN '1':: INTEGER >= least_hn(b.fromhn, b.tohn)
>> AND '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>> AND (
>> '1'::INTEGER % 2) =
>> (to_number(b.fromhn,'99999999') % 2)::INTEGER THEN 0
>> WHEN '1':: INTEGER >= least_hn(b.fromhn,b.tohn)
>> AND '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>> THEN 2
>> ELSE ((1.0 -
>> (least_hn('1'::text,least_hn(b.fromhn,b.tohn)::text)::NUMERIC /
>> (greatest(1,greatest_hn('1'::text,greatest_hn(b.fromhn,b.tohn)::text)))
>> )) * 5)::INTEGER + 5
>> END::INTEGER AS sub_rating,
>> '1'::INTEGER >= least_hn(b.fromhn,b.tohn)
>> AND '1'::INTEGER <= greatest_hn(b.fromhn,b.tohn)
>> AND (
>> '1' % 2)::NUMERIC::INTEGER =
>> (to_number(b.fromhn,'99999999') % 2) AS exact_address,
>> b.name <http://b.name>,
>> b.prequalabr,
>> b.pretypabrv,
>> b.tfidr,
>> b.tfidl,
>> b.the_geom,
>> b.place
>> FROM b
>> ORDER BY 10 ,
>> 11 DESC limit 20 )
>>
>>
>>
>> --, foo AS (
>>
>> SELECT DISTINCT
>> ON (
>>
>> sub.predirabrv,sub.fename,coalesce(sub.suftypabrv, sub.pretypabrv)
>> ,sub.sufdirabrv,sub.place,s.stusps,sub.zip) sub.predirabrv AS fedirp,
>> sub.fename,
>> coalesce(sub.suftypabrv, sub.pretypabrv) AS fetype,
>> sub.sufdirabrv AS fedirs,
>> sub.place ,
>>
>> s.stusps
>> AS state,
>>
>> sub.zip
>> AS zip,
>> interpolate_from_address('1', sub.fromhn, sub.tohn,
>> sub.the_geom,
>> sub.side)
>> AS address_geom,
>> (sub.sub_rating + least(coalesce(diff_zip('01104' ,
>> sub.zip),0), 20)*2.00 + coalesce(levenshtein_ignore_case('SPRINGFIELD',
>> sub.place),5) )::INTEGER AS sub_rating,
>>
>> sub.exact_address
>> AS exact_address,
>> sub.tohn,
>> sub.fromhn
>> FROM sub
>> join tiger.state s
>> ON (
>> '25' = s.statefp)
>> ORDER BY 1,
>> 2,
>> 3,
>> 4,
>> 5,
>> 6,
>> 7,
>> 9 limit 20)
>>
>> SELECT *
>> FROM foo
>> ORDER BY sub_rating,
>> exact_address DESC limit 100
>> -------------------------------------
>>
>>
>> _______________________________________________
>> postgis-devel mailing list
>> postgis-devel at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/postgis-devel
>>
>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the postgis-devel
mailing list