[postgis-users] Various ways to handle addresses in postgresql

Stephen Woodbridge stephenwoodbridge37 at gmail.com
Sat Apr 25 19:08:47 PDT 2020


On 4/25/2020 7:19 PM, Shaozhong SHI wrote:
> Hi, Steve,
>
> Many thanks.  Please send me the link to parse_address() and 
> standardize_address().

If you already have postGIS installed then

create extension address_standardizer;

# \df parse_address
List of functions
  Schema |     Name      | Result data type | Argument data types |  Type
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------
  public | parse_address | record           | text, OUT num text, OUT 
street text, OUT street2 text, OUT address1 text, OUT city text, OUT 
state text, OUT zip text, OUT zipplus text, OUT country text | normal
(1 row)

# select * from parse_address('123-2 main street city ny');
   num  |   street    | street2 |     address1      | city | state | zip 
| zipplus | country
-------+-------------+---------+-------------------+------+-------+-----+---------+---------
  123-2 | main street |         | 123-2 main street | city | NY |     
|         | US
(1 row)

# \df standardize*
                                                     List of functions
  Schema |        Name         | Result data type |                      
Argument data types                      | Type
--------+---------------------+------------------+---------------------------------------------------------------+--------
  public | standardize_address | stdaddr          | lextab text, gaztab 
text, rultab text, address text           | normal
  public | standardize_address | stdaddr          | lextab text, gaztab 
text, rultab text, micro text, macro text | normal

You need tables for the lexicon, gazetteer, and rules, which should be 
included in the extension but I'm not seeing them. So you can grab these 
from:

https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql
https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql

and load them like:

psql mydb -f us-gaz.sql
psql mydb -f us-lex.sql
psql mydb -f us-rules.sql

# select * from standardize_address('lex', 'gaz', 'rules', '123-2 main 
street city ny');
  building | house_num | predir | qual | pretype |  name  | suftype | 
sufdir | ruralroute | extra | city |  state   | country | postcode | box 
| unit
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------
           | 123       |        |      |         | 2 MAIN | STREET 
|        |            |       | CITY | NEW YORK | USA     | |     |
(1 row)


This is a good example of why parsing addresses is so difficult. The 
rules for standardize_address do not account for a house number like 
"123-2", but the regexp in parse_address do handle it. It is easy to get 
the 80% right and very hard to get it much above that.

-Steve


>
> I need to find these first before test-running.
>
> Regards,
>
> Shao
>
> On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge 
> <stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>> 
> wrote:
>
>     Shao,
>
>     '^( *Building *[0-9]+)?[- 0-9]*'
>
>     or something like that should do it. But I think you will find that a
>     more robust solution is to use parse_address() and/or
>     standardize_address() as they will recognize a lot of other address
>     constructs, like "apt 3a" for for example.
>
>     parse_address() that a text field and breaks it into "house number
>     street name" and "city state zip", but only works well in North
>     America.
>
>     standardize_address() that comes with postGIS, breaks the address
>     down
>     into its components and can separate out things like buildings, and
>     apartment/unit specifiers so you can then take the fields you are
>     interested in and recombine just them in a new string. Again, this
>     works
>     best in North America.
>
>     My github address-standardizer is built to recognize address for most
>     counties, but it can also be configured to recognize address
>     standards
>     for any county without too much effort. It compiles and installs as
>     postgresql extension.
>
>     Addresses are generally very messy and unless your addresses are vary
>     simple you will be constantly fighting with this or that exception.
>
>     -Steve
>
>     On 4/25/2020 2:55 PM, Shaozhong SHI wrote:
>     > Is there a way to left trim including the building and number?
>     >
>     > Building 3  21-1              Great Avenue, a city, a country, this
>     > planet.
>     >
>     > How to take way those things which are too local to an address?
>     >
>     > Regards,
>     >
>     > Shao
>     >
>     > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI
>     <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>
>     > <mailto:shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>>>
>     wrote:
>     >
>     >     I find this is a simple, but important question.
>     >
>     >     How best to split numbers and the rest of address?
>     >
>     >     For instance, one tricky one is as follows:
>     >
>     >     21-1 Great Avenue, a city, a country, this planet
>     >
>     >     How to turn this into the following:
>     >
>     >     column 1,       column 2
>     >
>     >       21-1              Great Avenue, a city, a country, this planet
>     >
>     >     Note:  there is a hyphen in  21-1
>     >
>     >     Any clue?
>     >
>     >     Regards,
>     >
>     >     Shao
>     >
>     >
>     > _______________________________________________
>     > postgis-users mailing list
>     > postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
>     > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>     _______________________________________________
>     postgis-users mailing list
>     postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
>     https://lists.osgeo.org/mailman/listinfo/postgis-users
>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users



More information about the postgis-users mailing list