[postgis-users] Various ways to handle addresses in postgresql

Shaozhong SHI shishaozhong at gmail.com
Wed Apr 29 06:15:51 PDT 2020


Hi, Steve,

I have address_standardizer extension created.

It is not like other functions.  How to review its code?  Does it need to
adapted to handle other countries addresses?

Regards,

Shao

On Sun, 26 Apr 2020 at 17:23, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:

> Shao,
>
> I just remembered the lex, gaz, and rules data is in a separate
> extension. The correct way to install it is with:
>
> create extension address_standardizer_data_us;
>
> -Steve
>
> On 4/26/2020 8:09 AM, Shaozhong SHI wrote:
> > Hi, Steve,
> >
> > Thanks.
> >
> > 2 questions.
> >
> > 1.  How can we remove things like Room 2a, Buildings 2-6b and etc with
> > regexp replace?
> > 2.  Once extensions created, can these functions be adapted?  Are
> > codes available ?    I will see whether to put it into a project, so
> > that our programmers can have work to do.
> >
> > Regards,
> >
> > Shao
> >
> > On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge
> > <stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>>
> > wrote:
> >
> >     On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
> >     > Hi, Steve,
> >     >
> >     > Many thanks.  Please send me the link to parse_address() and
> >     > standardize_address().
> >
> >     If you already have postGIS installed then
> >
> >     create extension address_standardizer;
> >
> >     # \df parse_address
> >     List of functions
> >       Schema |     Name      | Result data type | Argument data types
> >     |  Type
> >
>  --------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
> >       public | parse_address | record           | text, OUT num text, OUT
> >     street text, OUT street2 text, OUT address1 text, OUT city text, OUT
> >     state text, OUT zip text, OUT zipplus text, OUT country text | normal
> >     (1 row)
> >
> >     # select * from parse_address('123-2 main street city ny');
> >        num  |   street    | street2 |     address1      | city | state
> >     | zip
> >     | zipplus | country
> >
>  -------+-------------+---------+-------------------+------+-------+-----+---------+---------
> >       123-2 | main street |         | 123-2 main street | city | NY |
> >     |         | US
> >     (1 row)
> >
> >     # \df standardize*
> >                                                          List of
> functions
> >       Schema |        Name         | Result data type |
> >     Argument data types                      | Type
> >
>  --------+---------------------+------------------+---------------------------------------------------------------+--------
> >       public | standardize_address | stdaddr          | lextab text,
> >     gaztab
> >     text, rultab text, address text           | normal
> >       public | standardize_address | stdaddr          | lextab text,
> >     gaztab
> >     text, rultab text, micro text, macro text | normal
> >
> >     You need tables for the lexicon, gazetteer, and rules, which
> >     should be
> >     included in the extension but I'm not seeing them. So you can grab
> >     these
> >     from:
> >
> >
> https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
> >
> https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
> >
> https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql
> >
> >     and load them like:
> >
> >     psql mydb -f us-gaz.sql
> >     psql mydb -f us-lex.sql
> >     psql mydb -f us-rules.sql
> >
> >     # select * from standardize_address('lex', 'gaz', 'rules', '123-2
> >     main
> >     street city ny');
> >       building | house_num | predir | qual | pretype |  name  | suftype |
> >     sufdir | ruralroute | extra | city |  state   | country | postcode
> >     | box
> >     | unit
> >
>  ----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
> >                | 123       |        |      |         | 2 MAIN | STREET
> >     |        |            |       | CITY | NEW YORK | USA     | |     |
> >     (1 row)
> >
> >
> >     This is a good example of why parsing addresses is so difficult. The
> >     rules for standardize_address do not account for a house number like
> >     "123-2", but the regexp in parse_address do handle it. It is easy
> >     to get
> >     the 80% right and very hard to get it much above that.
> >
> >     -Steve
> >
> >
> >     >
> >     > I need to find these first before test-running.
> >     >
> >     > Regards,
> >     >
> >     > Shao
> >     >
> >     > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
> >     > <stephenwoodbridge37 at gmail.com
> >     <mailto:stephenwoodbridge37 at gmail.com>
> >     <mailto:stephenwoodbridge37 at gmail.com
> >     <mailto:stephenwoodbridge37 at gmail.com>>>
> >     > wrote:
> >     >
> >     >     Shao,
> >     >
> >     >     '^( *Building *[0-9]+)?[- 0-9]*'
> >     >
> >     >     or something like that should do it. But I think you will
> >     find that a
> >     >     more robust solution is to use parse_address() and/or
> >     >     standardize_address() as they will recognize a lot of other
> >     address
> >     >     constructs, like "apt 3a" for for example.
> >     >
> >     >     parse_address() that a text field and breaks it into "house
> >     number
> >     >     street name" and "city state zip", but only works well in North
> >     >     America.
> >     >
> >     >     standardize_address() that comes with postGIS, breaks the
> >     address
> >     >     down
> >     >     into its components and can separate out things like
> >     buildings, and
> >     >     apartment/unit specifiers so you can then take the fields
> >     you are
> >     >     interested in and recombine just them in a new string.
> >     Again, this
> >     >     works
> >     >     best in North America.
> >     >
> >     >     My github address-standardizer is built to recognize address
> >     for most
> >     >     counties, but it can also be configured to recognize address
> >     >     standards
> >     >     for any county without too much effort. It compiles and
> >     installs as
> >     >     postgresql extension.
> >     >
> >     >     Addresses are generally very messy and unless your addresses
> >     are vary
> >     >     simple you will be constantly fighting with this or that
> >     exception.
> >     >
> >     >     -Steve
> >     >
> >     >     On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
> >     >     > Is there a way to left trim including the building and
> number?
> >     >     >
> >     >     > Building 3  21-1              Great Avenue, a city, a
> >     country, this
> >     >     > planet.
> >     >     >
> >     >     > How to take way those things which are too local to an
> >     address?
> >     >     >
> >     >     > Regards,
> >     >     >
> >     >     > Shao
> >     >     >
> >     >     > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
> >     >     <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>
> >     <mailto:shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>>
> >     >     > <mailto:shishaozhong at gmail.com
> >     <mailto:shishaozhong at gmail.com> <mailto:shishaozhong at gmail.com
> >     <mailto:shishaozhong at gmail.com>>>>
> >     >     wrote:
> >     >     >
> >     >     >     I find this is a simple, but important question.
> >     >     >
> >     >     >     How best to split numbers and the rest of address?
> >     >     >
> >     >     >     For instance, one tricky one is as follows:
> >     >     >
> >     >     >     21-1 Great Avenue, a city, a country, this planet
> >     >     >
> >     >     >     How to turn this into the following:
> >     >     >
> >     >     >     column 1,       column 2
> >     >     >
> >     >     >       21-1              Great Avenue, a city, a country,
> >     this planet
> >     >     >
> >     >     >     Note:  there is a hyphen in  21-1
> >     >     >
> >     >     >     Any clue?
> >     >     >
> >     >     >     Regards,
> >     >     >
> >     >     >     Shao
> >     >     >
> >     >     >
> >     >     > _______________________________________________
> >     >     > postgis-users mailing list
> >     >     > postgis-users at lists.osgeo.org
> >     <mailto:postgis-users at lists.osgeo.org>
> >     <mailto:postgis-users at lists.osgeo.org
> >     <mailto:postgis-users at lists.osgeo.org>>
> >     >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
> >     >
> >     >     _______________________________________________
> >     >     postgis-users mailing list
> >     > postgis-users at lists.osgeo.org
> >     <mailto:postgis-users at lists.osgeo.org>
> >     <mailto:postgis-users at lists.osgeo.org
> >     <mailto:postgis-users at lists.osgeo.org>>
> >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
> >     >
> >     >
> >     > _______________________________________________
> >     > postgis-users mailing list
> >     > postgis-users at lists.osgeo.org <mailto:
> postgis-users at lists.osgeo.org>
> >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
> >
> >     _______________________________________________
> >     postgis-users mailing list
> >     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
> >     https://lists.osgeo.org/mailman/listinfo/postgis-users
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200429/a159c12b/attachment.html>


More information about the postgis-users mailing list