[postgis-users] Various ways to handle addresses in postgresql

Shaozhong SHI shishaozhong at gmail.com
Sun Apr 26 05:09:38 PDT 2020


Hi, Steve,

Thanks.

2 questions.

1.  How can we remove things like Room 2a, Buildings 2-6b and etc with
regexp replace?
2.  Once extensions created, can these functions be adapted?  Are codes
available ?    I will see whether to put it into a project, so that our
programmers can have work to do.

Regards,

Shao

On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:

> On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
> > Hi, Steve,
> >
> > Many thanks.  Please send me the link to parse_address() and
> > standardize_address().
>
> If you already have postGIS installed then
>
> create extension address_standardizer;
>
> # \df parse_address
> List of functions
>   Schema |     Name      | Result data type | Argument data types |  Type
>
> --------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
>   public | parse_address | record           | text, OUT num text, OUT
> street text, OUT street2 text, OUT address1 text, OUT city text, OUT
> state text, OUT zip text, OUT zipplus text, OUT country text | normal
> (1 row)
>
> # select * from parse_address('123-2 main street city ny');
>    num  |   street    | street2 |     address1      | city | state | zip
> | zipplus | country
>
> -------+-------------+---------+-------------------+------+-------+-----+---------+---------
>   123-2 | main street |         | 123-2 main street | city | NY |
> |         | US
> (1 row)
>
> # \df standardize*
>                                                      List of functions
>   Schema |        Name         | Result data type |
> Argument data types                      | Type
>
> --------+---------------------+------------------+---------------------------------------------------------------+--------
>   public | standardize_address | stdaddr          | lextab text, gaztab
> text, rultab text, address text           | normal
>   public | standardize_address | stdaddr          | lextab text, gaztab
> text, rultab text, micro text, macro text | normal
>
> You need tables for the lexicon, gazetteer, and rules, which should be
> included in the extension but I'm not seeing them. So you can grab these
> from:
>
>
> https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
>
> https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
>
> https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql
>
> and load them like:
>
> psql mydb -f us-gaz.sql
> psql mydb -f us-lex.sql
> psql mydb -f us-rules.sql
>
> # select * from standardize_address('lex', 'gaz', 'rules', '123-2 main
> street city ny');
>   building | house_num | predir | qual | pretype |  name  | suftype |
> sufdir | ruralroute | extra | city |  state   | country | postcode | box
> | unit
>
> ----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
>            | 123       |        |      |         | 2 MAIN | STREET
> |        |            |       | CITY | NEW YORK | USA     | |     |
> (1 row)
>
>
> This is a good example of why parsing addresses is so difficult. The
> rules for standardize_address do not account for a house number like
> "123-2", but the regexp in parse_address do handle it. It is easy to get
> the 80% right and very hard to get it much above that.
>
> -Steve
>
>
> >
> > I need to find these first before test-running.
> >
> > Regards,
> >
> > Shao
> >
> > On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge
> > <stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>>
> > wrote:
> >
> >     Shao,
> >
> >     '^( *Building *[0-9]+)?[- 0-9]*'
> >
> >     or something like that should do it. But I think you will find that a
> >     more robust solution is to use parse_address() and/or
> >     standardize_address() as they will recognize a lot of other address
> >     constructs, like "apt 3a" for for example.
> >
> >     parse_address() that a text field and breaks it into "house number
> >     street name" and "city state zip", but only works well in North
> >     America.
> >
> >     standardize_address() that comes with postGIS, breaks the address
> >     down
> >     into its components and can separate out things like buildings, and
> >     apartment/unit specifiers so you can then take the fields you are
> >     interested in and recombine just them in a new string. Again, this
> >     works
> >     best in North America.
> >
> >     My github address-standardizer is built to recognize address for most
> >     counties, but it can also be configured to recognize address
> >     standards
> >     for any county without too much effort. It compiles and installs as
> >     postgresql extension.
> >
> >     Addresses are generally very messy and unless your addresses are vary
> >     simple you will be constantly fighting with this or that exception.
> >
> >     -Steve
> >
> >     On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
> >     > Is there a way to left trim including the building and number?
> >     >
> >     > Building 3  21-1              Great Avenue, a city, a country, this
> >     > planet.
> >     >
> >     > How to take way those things which are too local to an address?
> >     >
> >     > Regards,
> >     >
> >     > Shao
> >     >
> >     > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
> >     <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>
> >     > <mailto:shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>>>
> >     wrote:
> >     >
> >     >     I find this is a simple, but important question.
> >     >
> >     >     How best to split numbers and the rest of address?
> >     >
> >     >     For instance, one tricky one is as follows:
> >     >
> >     >     21-1 Great Avenue, a city, a country, this planet
> >     >
> >     >     How to turn this into the following:
> >     >
> >     >     column 1,       column 2
> >     >
> >     >       21-1              Great Avenue, a city, a country, this
> planet
> >     >
> >     >     Note:  there is a hyphen in  21-1
> >     >
> >     >     Any clue?
> >     >
> >     >     Regards,
> >     >
> >     >     Shao
> >     >
> >     >
> >     > _______________________________________________
> >     > postgis-users mailing list
> >     > postgis-users at lists.osgeo.org <mailto:
> postgis-users at lists.osgeo.org>
> >     > https://lists.osgeo.org/mailman/listinfo/postgis-users
> >
> >     _______________________________________________
> >     postgis-users mailing list
> >     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
> >     https://lists.osgeo.org/mailman/listinfo/postgis-users
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200426/e8642517/attachment.html>


More information about the postgis-users mailing list