Hi, Steve,<div><br></div><div>Another solution appeals me most is as follows:</div><div><br></div><div>Given a space delimited full address line, we can parse it to correct BS7666 format.</div><div><br></div><div>Something like house number, street, area, city, postcode</div><div><br></div><div>E.g., Nest 2 my nesting place 1B Great Avenue Forest Park London WS22 5TT</div><div>Can you enlighten me about that?</div><div><br></div><div>Regards<br>,</div><div>David</div><div><br>On Saturday, 9 January 2021, Stephen Woodbridge <<a href="mailto:stephenwoodbridge37@gmail.com">stephenwoodbridge37@gmail.com</a>> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">David,<br>
<br>
This is the link to the address standardizer:<br>
<a href="https://github.com/woodbri/address-standardizer" target="_blank">https://github.com/woodbri/add<wbr>ress-standardizer</a><br>
<br>
This is a link to all my code that I developed consulting. It includes a few SQL geocoders based on the code above. And has some README files discussing how to build a geocoder which is the basis for how the geocoders work.<br>
<br>
<a href="https://github.com/woodbri/imaptools.com" target="_blank">https://github.com/woodbri/ima<wbr>ptools.com</a><br>
<br>
this is the geocoder for Tiger data, but the code is essentially the same for every country because the when you load country specific data into the database it goes into its own table and then you standardize that data into stdstreets table and all queries are done against the stdstreets table and you only have to tweak the address range interpolation function which needs to access the source streets table for the geometry and house number ranges.<br>
<br>
<a href="https://github.com/woodbri/imaptools.com/blob/master/sql-scripts/geocoder/prep-tiger-geo-new.sql" target="_blank">https://github.com/woodbri/ima<wbr>ptools.com/blob/master/sql-scr<wbr>ipts/geocoder/prep-tiger-geo-<wbr>new.sql</a><br>
<br>
I would approach this by:<br>
<br>
1. get the address standardizer compiled and installed. I can help if you run into problems or have questions.<br>
2. load your UK street data into rawdata schema, ideally it would be best if we can create a table/view that presents this data as a single table where each record represents one side of the street and one jurisdiction this may mean that a single record in your source data will generate multiple records in this table/view (this greatly simplifies the coding and performance later)<br>
3. look at the prep-tiger-geo-new.sql file<br>
4. create a stdstreets table and standardize your table/view data into it<br>
5. look at standardization failures and adjust lexicon and grammar as needed<br>
6. loop back to 4 until good enough<br>
7. load functions from prep-tiger-geo-new.sql file and adjust any for your data<br>
8. try it out!<br>
<br>
-Steve<br>
<br>
<br>
On 1/9/2021 10:22 AM, Shaozhong SHI wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Hi, Stephen,<br>
<br>
Many thanks. We are interested in it is working with the UK addresses.<br>
<br>
Please send me the link to this.<br>
<br>
Regards,<br>
<br>
David<br>
<br>
On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge <<a href="mailto:stephenwoodbridge37@gmail.com" target="_blank">stephenwoodbridge37@gmail.com</a> <mailto:<a href="mailto:stephenwoodbridge37@gmail.com" target="_blank">stephenwoodbridge37@gm<wbr>ail.com</a>>> wrote:<br>
<br>
David,<br>
<br>
Yup and this is just one a dozens of cases that you have to deal<br>
with. You are dealing with a natural language processing problem.<br>
And you have to deal with human input that has typos and<br>
abbreviations.<br>
<br>
These issues are what the address standardizer fixes. It tokenized<br>
the address and uses the gazette to standardize the terms and then<br>
classifies each term and assigns it to part of the address based<br>
on a grammar.<br>
<br>
So there is a simple solution, use my address standardizer, it is<br>
free, MIT license, it has a sample lexicon/ gazette and grammar<br>
for the UK, it is easy to modify these to fit your needs, and it<br>
just works. Oh if you want to do another county it also has sample<br>
files for 25 countries.<br>
<br>
Sent from my iPhone<br>
<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski<br>
<<a href="mailto:me@komzpa.net" target="_blank">me@komzpa.net</a> <mailto:<a href="mailto:me@komzpa.net" target="_blank">me@komzpa.net</a>>> wrote:<br>
<br>
<br>
Hello,<br>
<br>
People make neural networks for this kind of task:<br>
<br>
<a href="https://github.com/openvenues/libpostal" target="_blank">https://github.com/openvenues/<wbr>libpostal</a><br>
<<a href="https://github.com/openvenues/libpostal" target="_blank">https://github.com/openvenues<wbr>/libpostal</a>><br>
<br>
сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI<br>
<<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a> <mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a><wbr>>> напісаў:<br>
<br>
Hi, Steve W,<br>
<br>
it is easy to parse addresses as tokens. But it is difficult<br>
to put tokens in right columns, due to that the same address<br>
could be expressed with partial address or full address.<br>
<br>
The same address can be written like, Flat 1 122 Great Avenue<br>
London UK, or Flat 1 122 Greet Avenue Central London London<br>
United Kingdom.<br>
<br>
When this happens, each address has different number of<br>
tokens, so different numbers of tokens. Is there a way to<br>
deal with this issue so that each token can get into right<br>
column?<br>
<br>
Please enlighten me.<br>
<br>
Regards,<br>
<br>
David<br>
<br>
On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge<br>
<<a href="mailto:stephenwoodbridge37@gmail.com" target="_blank">stephenwoodbridge37@gmail.com</a><br>
<mailto:<a href="mailto:stephenwoodbridge37@gmail.com" target="_blank">stephenwoodbridge37@gm<wbr>ail.com</a>>> wrote:<br>
<br>
And I have create an address-standardizer project here<br>
<a href="https://github.com/woodbri/address-standardizer" target="_blank">https://github.com/woodbri/add<wbr>ress-standardizer</a><br>
<<a href="https://github.com/woodbri/address-standardizer" target="_blank">https://github.com/woodbri/ad<wbr>dress-standardizer</a>> which<br>
is user<br>
configurable. I might be over kill is you just want to<br>
strip off the<br>
number, in which case you might just use a SQL regexp<br>
replace to remove it.<br>
<br>
-Steve W<br>
<br>
On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:<br>
> PostGIS has address_standardizer extension that includes<br>
> parse_address() and standardize_address() functions.<br>
><br>
> -Steve W<br>
><br>
> On 4/24/2020 9:54 PM, Imre Samu wrote:<br>
>> > handle addresses in postgresql<br>
>><br>
>> maybe you can use the<br>
<a href="https://github.com/openvenues/libpostal" target="_blank">https://github.com/openvenues/<wbr>libpostal</a><br>
<<a href="https://github.com/openvenues/libpostal" target="_blank">https://github.com/openvenues<wbr>/libpostal</a>> library<br>
>> with your favorite language bindings ( Python / Ruby /<br>
Go / PHP /<br>
>> Node / R / Java ...)<br>
>><br>
>> or as a Postgres database extension:<br>
>><br>
<a href="https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal" target="_blank">https://info.crunchydata.com/b<wbr>log/quick-and-dirty-address-ma<wbr>tching-with-libpostal</a><br>
<<a href="https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal" target="_blank">https://info.crunchydata.com/<wbr>blog/quick-and-dirty-address-m<wbr>atching-with-libpostal</a>><br>
<br>
>><br>
>> <a href="https://github.com/pramsey/pgsql-postal" target="_blank">https://github.com/pramsey/pgs<wbr>ql-postal</a><br>
<<a href="https://github.com/pramsey/pgsql-postal" target="_blank">https://github.com/pramsey/pg<wbr>sql-postal</a>><br>
>><br>
>> Regards,<br>
>> Imre<br>
>><br>
>><br>
>><br>
>><br>
>> Shaozhong SHI <<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a><br>
<mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a><wbr>><br>
>> <mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a><br>
<mailto:<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a><wbr>>>> ezt írta (időpont:<br>
2020. ápr. 25.,<br>
>> Szo, 2:49):<br>
>><br>
>> I find this is a simple, but important question.<br>
>><br>
>> How best to split numbers and the rest of address?<br>
>><br>
>> For instance, one tricky one is as follows:<br>
>><br>
>> 21-1 Great Avenue, a city, a country, this planet<br>
>><br>
>> How to turn this into the following:<br>
>><br>
>> column 1, column 2<br>
>><br>
>> 21-1 Great Avenue, a city, a<br>
country, this planet<br>
>><br>
>> Note: there is a hyphen in 21-1<br>
>><br>
>> Any clue?<br>
>><br>
>> Regards,<br>
>><br>
>> Shao<br>
>> ______________________________<wbr>_________________<br>
>> postgis-users mailing list<br>
>> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>><br>
<mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a><br>
<mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>>><br>
>> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
<<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailm<wbr>an/listinfo/postgis-users</a>><br>
>><br>
>><br>
>> ______________________________<wbr>_________________<br>
>> postgis-users mailing list<br>
>> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>><br>
>> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
<<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailm<wbr>an/listinfo/postgis-users</a>><br>
><br>
<br>
______________________________<wbr>_________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
<<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailm<wbr>an/listinfo/postgis-users</a>><br>
<br>
______________________________<wbr>_________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
<<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailm<wbr>an/listinfo/postgis-users</a>><br>
<br>
______________________________<wbr>_________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a> <mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
<<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailm<wbr>an/listinfo/postgis-users</a>><br>
</blockquote>
______________________________<wbr>_________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a> <mailto:<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.os<wbr>geo.org</a>><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
<<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailm<wbr>an/listinfo/postgis-users</a>><br>
<br>
<br>
______________________________<wbr>_________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
</blockquote>
<br>
______________________________<wbr>_________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" target="_blank">https://lists.osgeo.org/mailma<wbr>n/listinfo/postgis-users</a><br>
</blockquote></div>