[postgis-users] Various ways to handle addresses in postgresql

Shaozhong SHI shishaozhong at gmail.com
Sun Jan 10 06:57:23 PST 2021


Hi, Steve,

Another solution appeals me most is as follows:

Given a space delimited full address line, we can parse it to correct
BS7666 format.

Something like house number, street, area, city, postcode

E.g., Nest 2 my nesting place 1B Great Avenue Forest Park London WS22 5TT
Can you enlighten me about that?

Regards
,
David

On Saturday, 9 January 2021, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:

> David,
>
> This is the link to the address standardizer:
> https://github.com/woodbri/address-standardizer
>
> This is a link to all my code that I developed consulting. It includes a
> few SQL geocoders based on the code above. And has some README files
> discussing how to build a geocoder which is the basis for how the geocoders
> work.
>
> https://github.com/woodbri/imaptools.com
>
> this is the geocoder for Tiger data, but the code is essentially the same
> for every country because the when you load country specific data into the
> database it goes into its own table and then you standardize that data into
> stdstreets table and all queries are done against the stdstreets table and
> you only have to tweak the address range interpolation function which needs
> to access the source streets table for the geometry and house number ranges.
>
> https://github.com/woodbri/imaptools.com/blob/master/sql-scr
> ipts/geocoder/prep-tiger-geo-new.sql
>
> I would approach this by:
>
> 1. get the address standardizer compiled and installed. I can help if you
> run into problems or have questions.
> 2. load your UK street data into rawdata schema, ideally it would be best
> if we can create a table/view that presents this data as a single table
> where each record represents one side of the street and one jurisdiction
> this may mean that a single record in your source data will generate
> multiple records in this table/view (this greatly simplifies the coding and
> performance later)
> 3. look at the prep-tiger-geo-new.sql file
> 4. create a stdstreets table and standardize your table/view data into it
> 5. look at standardization failures and adjust lexicon and grammar as
> needed
> 6. loop back to 4 until good enough
> 7. load functions from prep-tiger-geo-new.sql file and adjust any for your
> data
> 8. try it out!
>
> -Steve
>
>
> On 1/9/2021 10:22 AM, Shaozhong SHI wrote:
>
>> Hi, Stephen,
>>
>> Many thanks.  We are interested in it is working with the UK addresses.
>>
>> Please send me the link to this.
>>
>> Regards,
>>
>> David
>>
>> On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge <
>> stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>>
>> wrote:
>>
>>     David,
>>
>>     Yup and this is just one a dozens of cases that you have to deal
>>     with. You are dealing with a natural language processing problem.
>>     And you have to deal with human input that has typos and
>>     abbreviations.
>>
>>     These issues are what the address standardizer fixes. It tokenized
>>     the address and uses the gazette to standardize the terms and then
>>     classifies each term and assigns it to part of the address based
>>     on a grammar.
>>
>>     So there is a simple solution, use my address standardizer, it is
>>     free, MIT license, it has a sample lexicon/ gazette and grammar
>>     for the UK, it is easy to modify these to fit your needs, and it
>>     just works. Oh if you want to do another county it also has sample
>>     files for 25 countries.
>>
>>     Sent from my iPhone
>>
>>     On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski
>>>     <me at komzpa.net <mailto:me at komzpa.net>> wrote:
>>>
>>>     
>>>     Hello,
>>>
>>>     People make neural networks for this kind of task:
>>>
>>>     https://github.com/openvenues/libpostal
>>>     <https://github.com/openvenues/libpostal>
>>>
>>>     сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI
>>>     <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>> напісаў:
>>>
>>>         Hi, Steve W,
>>>
>>>         it is easy to parse addresses as tokens. But it is difficult
>>>         to put tokens in right columns, due to that the same address
>>>         could be expressed with partial address or full address.
>>>
>>>         The same address can be written like, Flat 1 122 Great Avenue
>>>         London UK, or Flat 1 122 Greet Avenue Central London London
>>>         United Kingdom.
>>>
>>>         When this happens, each address has different number of
>>>         tokens, so different numbers of tokens.  Is there a way to
>>>         deal with this issue so that each token can get into right
>>>         column?
>>>
>>>         Please enlighten me.
>>>
>>>         Regards,
>>>
>>>         David
>>>
>>>         On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge
>>>         <stephenwoodbridge37 at gmail.com
>>>         <mailto:stephenwoodbridge37 at gmail.com>> wrote:
>>>
>>>             And I have create an address-standardizer project here
>>>             https://github.com/woodbri/address-standardizer
>>>             <https://github.com/woodbri/address-standardizer> which
>>>             is user
>>>             configurable. I might be over kill is you just want to
>>>             strip off the
>>>             number, in which case you might just use a SQL regexp
>>>             replace to remove it.
>>>
>>>             -Steve W
>>>
>>>             On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:
>>>             > PostGIS has address_standardizer extension that includes
>>>             > parse_address() and standardize_address() functions.
>>>             >
>>>             > -Steve W
>>>             >
>>>             > On 4/24/2020 9:54 PM, Imre Samu wrote:
>>>             >> > handle addresses in postgresql
>>>             >>
>>>             >> maybe you can use the
>>>             https://github.com/openvenues/libpostal
>>>             <https://github.com/openvenues/libpostal> library
>>>             >> with your favorite language bindings ( Python / Ruby /
>>>             Go / PHP /
>>>             >> Node / R / Java  ...)
>>>             >>
>>>             >> or as a Postgres database extension:
>>>             >>
>>>             https://info.crunchydata.com/blog/quick-and-dirty-address-ma
>>> tching-with-libpostal
>>>             <https://info.crunchydata.com/blog/quick-and-dirty-address-m
>>> atching-with-libpostal>
>>>
>>>             >>
>>>             >> https://github.com/pramsey/pgsql-postal
>>>             <https://github.com/pramsey/pgsql-postal>
>>>             >>
>>>             >> Regards,
>>>             >>  Imre
>>>             >>
>>>             >>
>>>             >>
>>>             >>
>>>             >> Shaozhong SHI <shishaozhong at gmail.com
>>>             <mailto:shishaozhong at gmail.com>
>>>             >> <mailto:shishaozhong at gmail.com
>>>             <mailto:shishaozhong at gmail.com>>> ezt írta (időpont:
>>>             2020. ápr. 25.,
>>>             >> Szo, 2:49):
>>>             >>
>>>             >>     I find this is a simple, but important question.
>>>             >>
>>>             >>     How best to split numbers and the rest of address?
>>>             >>
>>>             >>     For instance, one tricky one is as follows:
>>>             >>
>>>             >>     21-1 Great Avenue, a city, a country, this planet
>>>             >>
>>>             >>     How to turn this into the following:
>>>             >>
>>>             >>     column 1,       column 2
>>>             >>
>>>             >>       21-1              Great Avenue, a city, a
>>>             country, this planet
>>>             >>
>>>             >>     Note:  there is a hyphen in 21-1
>>>             >>
>>>             >>     Any clue?
>>>             >>
>>>             >>     Regards,
>>>             >>
>>>             >>     Shao
>>>             >> _______________________________________________
>>>             >>     postgis-users mailing list
>>>             >> postgis-users at lists.osgeo.org
>>>             <mailto:postgis-users at lists.osgeo.org>
>>>             <mailto:postgis-users at lists.osgeo.org
>>>             <mailto:postgis-users at lists.osgeo.org>>
>>>             >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>             <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>             >>
>>>             >>
>>>             >> _______________________________________________
>>>             >> postgis-users mailing list
>>>             >> postgis-users at lists.osgeo.org
>>>             <mailto:postgis-users at lists.osgeo.org>
>>>             >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>             <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>             >
>>>
>>>             _______________________________________________
>>>             postgis-users mailing list
>>>             postgis-users at lists.osgeo.org
>>>             <mailto:postgis-users at lists.osgeo.org>
>>>             https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>             <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>
>>>         _______________________________________________
>>>         postgis-users mailing list
>>>         postgis-users at lists.osgeo.org
>>>         <mailto:postgis-users at lists.osgeo.org>
>>>         https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>         <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>
>>>     _______________________________________________
>>>     postgis-users mailing list
>>>     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
>>>     https://lists.osgeo.org/mailman/listinfo/postgis-users
>>>     <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>
>>     _______________________________________________
>>     postgis-users mailing list
>>     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
>>     https://lists.osgeo.org/mailman/listinfo/postgis-users
>>     <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210110/f8572ce2/attachment.html>


More information about the postgis-users mailing list