<div dir="ltr">Hi, Steve,<div><br></div><div>Thanks.</div><div><br></div><div>2 questions.</div><div><br></div><div>1. How can we remove things like Room 2a, Buildings 2-6b and etc with regexp replace?</div><div>2. Once extensions created, can these functions be adapted? Are codes available ? I will see whether to put it into a project, so that our programmers can have work to do.</div><div><br></div><div>Regards,</div><div><br></div><div>Shao</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Sun, 26 Apr 2020 at 03:09, Stephen Woodbridge <<a href="mailto:stephenwoodbridge37@gmail.com">stephenwoodbridge37@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 4/25/2020 7:19 PM, Shaozhong SHI wrote:<br>
> Hi, Steve,<br>
><br>
> Many thanks. Please send me the link to parse_address() and <br>
> standardize_address().<br>
<br>
If you already have postGIS installed then<br>
<br>
create extension address_standardizer;<br>
<br>
# \df parse_address<br>
List of functions<br>
Schema | Name | Result data type | Argument data types | Type<br>
--------+---------------+------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------+--------<br>
public | parse_address | record | text, OUT num text, OUT <br>
street text, OUT street2 text, OUT address1 text, OUT city text, OUT <br>
state text, OUT zip text, OUT zipplus text, OUT country text | normal<br>
(1 row)<br>
<br>
# select * from parse_address('123-2 main street city ny');<br>
num | street | street2 | address1 | city | state | zip <br>
| zipplus | country<br>
-------+-------------+---------+-------------------+------+-------+-----+---------+---------<br>
123-2 | main street | | 123-2 main street | city | NY | <br>
| | US<br>
(1 row)<br>
<br>
# \df standardize*<br>
List of functions<br>
Schema | Name | Result data type | <br>
Argument data types | Type<br>
--------+---------------------+------------------+---------------------------------------------------------------+--------<br>
public | standardize_address | stdaddr | lextab text, gaztab <br>
text, rultab text, address text | normal<br>
public | standardize_address | stdaddr | lextab text, gaztab <br>
text, rultab text, micro text, macro text | normal<br>
<br>
You need tables for the lexicon, gazetteer, and rules, which should be <br>
included in the extension but I'm not seeing them. So you can grab these <br>
from:<br>
<br>
<a href="https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql" rel="noreferrer" target="_blank">https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-gaz.sql</a><br>
<a href="https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql" rel="noreferrer" target="_blank">https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-lex.sql</a><br>
<a href="https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql" rel="noreferrer" target="_blank">https://raw.githubusercontent.com/woodbri/imaptools.com/master/sql-scripts/geocoder/us-rules.sql</a><br>
<br>
and load them like:<br>
<br>
psql mydb -f us-gaz.sql<br>
psql mydb -f us-lex.sql<br>
psql mydb -f us-rules.sql<br>
<br>
# select * from standardize_address('lex', 'gaz', 'rules', '123-2 main <br>
street city ny');<br>
building | house_num | predir | qual | pretype | name | suftype | <br>
sufdir | ruralroute | extra | city | state | country | postcode | box <br>
| unit<br>
----------+-----------+--------+------+---------+--------+---------+--------+------------+-------+------+----------+---------+----------+-----+------<br>
| 123 | | | | 2 MAIN | STREET <br>
| | | | CITY | NEW YORK | USA | | |<br>
(1 row)<br>
<br>
<br>
This is a good example of why parsing addresses is so difficult. The <br>
rules for standardize_address do not account for a house number like <br>
"123-2", but the regexp in parse_address do handle it. It is easy to get <br>
the 80% right and very hard to get it much above that.<br>
<br>
-Steve<br>
<br>
<br>
><br>
> I need to find these first before test-running.<br>
><br>
> Regards,<br>
><br>
> Shao<br>
><br>
> On Sat, 25 Apr 2020 at 21:20, Stephen Woodbridge <br>
> <<a href="mailto:stephenwoodbridge37@gmail.com" target="_blank">stephenwoodbridge37@gmail.com</a> <mailto:<a href="mailto:stephenwoodbridge37@gmail.com" target="_blank">stephenwoodbridge37@gmail.com</a>>> <br>
> wrote:<br>
><br>
> Shao,<br>
><br>
> '^( *Building *[0-9]+)?[- 0-9]*'<br>
><br>
> or something like that should do it. But I think you will find that a<br>
> more robust solution is to use parse_address() and/or<br>
> standardize_address() as they will recognize a lot of other address<br>
> constructs, like "apt 3a" for for example.<br>
><br>
> parse_address() that a text field and breaks it into "house number<br>
> street name" and "city state zip", but only works well in North<br>
> America.<br>
><br>
> standardize_address() that comes with postGIS, breaks the address<br>
> down<br>
> into its components and can separate out things like buildings, and<br>
> apartment/unit specifiers so you can then take the fields you are<br>
> interested in and recombine just them in a new string. Again, this<br>
> works<br>
> best in North America.<br>
><br>
> My github address-standardizer is built to recognize address for most<br>
> counties, but it can also be configured to recognize address<br>
> standards<br>
> for any county without too much effort. It compiles and installs as<br>
> postgresql extension.<br>
><br>
> Addresses are generally very messy and unless your addresses are vary<br>
> simple you will be constantly fighting with this or that exception.<br>
><br>
> -Steve<br>
><br>
> On 4/25/2020 2:55 PM, Shaozhong SHI wrote:<br>
> > Is there a way to left trim including the building and number?<br>
> ><br>
> > Building 3 21-1 Great Avenue, a city, a country, this<br>
> > planet.<br>
> ><br>
> > How to take way those things which are too local to an address?<br>
> ><br>
> > Regards,<br>
> ><br>
> > Shao<br>
> ><br>
> > On Sat, 25 Apr 2020 at 01:48, Shaozhong SHI<br>
> <<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a> <mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a>><br>
> > <mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a> <mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a>>>><br>
> wrote:<br>
> ><br>
> > I find this is a simple, but important question.<br>
> ><br>
> > How best to split numbers and the rest of address?<br>
> ><br>
> > For instance, one tricky one is as follows:<br>
> ><br>
> > 21-1 Great Avenue, a city, a country, this planet<br>
> ><br>
> > How to turn this into the following:<br>
> ><br>
> > column 1, column 2<br>
> ><br>
> > 21-1 Great Avenue, a city, a country, this planet<br>
> ><br>
> > Note: there is a hyphen in 21-1<br>
> ><br>
> > Any clue?<br>
> ><br>
> > Regards,<br>
> ><br>
> > Shao<br>
> ><br>
> ><br>
> > _______________________________________________<br>
> > postgis-users mailing list<br>
> > <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a> <mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a>><br>
> > <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
><br>
> _______________________________________________<br>
> postgis-users mailing list<br>
> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a> <mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a>><br>
> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
><br>
><br>
> _______________________________________________<br>
> postgis-users mailing list<br>
> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>