<div dir="ltr"><div>There are lots of classes of tricky cases:</div><div><br></div><div>For example:</div><div><br></div><div> 1234 County Road E, Hudson </div><div><br></div><div>Is actually "County Road E", not "County Road East". The county roads there are indexed by letters. There is also a County Road F, and a County Road N, and a County Road FF, etc.) Last I checked, even Google, with all their data, gets this one wrong.</div><div><br></div><div>In general, address parsing is a hard problem and there are always (at least in the US) going to be cases that are ambiguous to parse. The people who have been answering your questions on this list have been working on this problem for as long as I can remember. And they have spent many years developing libraries that make dealing with many of the corner cases of address parsing easier. They started these libraries because the quick and simple solutions you seem to be hoping for failed to deal with the realities of addressing. There is not a perfect solution, but what they are suggesting are the best solutions available.</div><div><br></div><div>If you want to start over and reinvent a simpler solution, go ahead. But be advised that the people answering you have been down that path. They know what the next problem will be, and the problem after that, and they know that at the end of the chain of problems you will discover you need something like the tools and libraries they are suggesting.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, May 11, 2020 at 12:31 PM Paul Ramsey <<a href="mailto:pramsey@cleverelephant.ca">pramsey@cleverelephant.ca</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">There's a huge range of potential functionality/features that can reside under the banner "fuzzy address matching". <br>
<br>
At the simplest, you're trying to catch common lexicographic differences, like off-by-one addresses or alternate spellings. This is the realm of trigrams and levenstein distances.<br>
Then you start dealing with different forms of abbreviations (rd, road, r) and formats (unit 4, #4, apt 4, 4). This is the realm of data-less standardizers.<br>
Then you start dealing with larger forms of aliasing. Standard city and state names, recognizing major components of addresses. This is the real of dictionary-backed standardizers.<br>
Then you start dealing with all forms of aliasing. This is actually getting down into geocoding, and address standardizers backed by complete address and road databases.<br>
<br>
What people mean when they say "can you do address standardization" can vary massively. It can also be frustrating because the performance of something like Google's geocoder algorithm, backed by the largest, and most up-to-data geographic database in the world gets casually compared to simple format and dictionary backed standardizer by folks with no understanding of the complexity or amount of data that lives under the covers of this "simple" task. I think people are far more understanding of something like machine translation and "get" that it's a really hard problem, because learning a new language is hard. But geocoding is "easy", anyone can look at an address and then look that address up in a map book (ha ha, well anyone over 40 at least).<br>
<br>
P.<br>
<br>
<br>
> On May 11, 2020, at 10:15 AM, Imre Samu <<a href="mailto:pella.samu@gmail.com" target="_blank">pella.samu@gmail.com</a>> wrote:<br>
> <br>
> > ... Fuzzy Address Matching? ... <br>
> > For instance, South Great Avenue, A City, Planet Earth may be put down as the following:<br>
> <br>
> sometimes .. the same country/city/building - has a different names .. <br>
> <br>
> "A City" example:<br>
> <a href="https://en.wikipedia.org/wiki/Vienna" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Vienna</a><br>
> <br>
> Vienna:<br>
> "Beč (Croatian, Serbian, older Bulgarian), Beç (older Turkish)*, Bech or Vidnya (Romani), Bécs (Hungarian)*, Bin / Pin - 빈 (Korean), Dunaj (Slovene)*, Fienna (Welsh), Vedunia (Celtic), Vena - Вена (Russian), Vídeň (Czech)*, Viden' / Videň (Ukrainian)*, Viedeň (Slovak), Viên (Vietnamese), Viena / Vijena/ Виена (Belarusian, Bulgarian, Macedonian), Viena (Catalan*, Lithuanian, Portuguese*, Romanian*, Spanish*, Tagalog*), Vienna (Italian)*, Vienne (French)*, Viénni - Βιέννη (Greek), Vieno (Esperanto), Viin (Estonian), Vin - װין (Yiddish), Vín (Irish, Icelandic), Vina - וינה (Hebrew), Vínarborg (Icelandic variant), Vindobona (Latin), Vīne (Latvian)*, Viyana (Turkish)*, Vjenë (Albanian), Vjenna (Maltese), Vyana (Azeri), Wean (local Viennese, Austrian and Bavarian dialects)*, Weiyena - 維也納 (Chinese)*, Wene (Afrikaans), Wenen (Dutch)*, Wiedeń (Polish)*, Wien (Danish*, Finnish*, German*, Norwegian*, Swedish*), Wīn - ウィーン (Japanese)*, Wina (Indonesian), فيينا (Arabic), وين (Persian)"<br>
> "<br>
> see: <a href="https://en.wikipedia.org/wiki/Names_of_European_cities_in_different_languages_(U%E2%80%93Z)#V" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Names_of_European_cities_in_different_languages_(U%E2%80%93Z)#V</a><br>
> <a href="https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/List_of_names_of_European_cities_in_different_languages</a><br>
> <br>
> or see the Nominatim names for Vienna:<br>
> <a href="https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=109166&class=boundary" rel="noreferrer" target="_blank">https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=109166&class=boundary</a><br>
> <br>
> <br>
> And sometimes - the buildings has a different names ...<br>
> <a href="https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=11101&class=amenity" rel="noreferrer" target="_blank">https://nominatim.openstreetmap.org/details.php?osmtype=R&osmid=11101&class=amenity</a><br>
> - Wien Rathaus (name)<br>
> - Ратуша (name:be)<br>
> - Vídeňská radnice (name:cs)<br>
> - Rathaus (name:de)<br>
> - Vienna City Hall (name:en)<br>
> - 市庁舎 (name:ja)<br>
> - Ратуша (name:ru)<br>
> - Віденська ратуша (name:uk)<br>
> <br>
> <br>
> <br>
> Best,<br>
> Imre<br>
> <br>
> <br>
> <br>
> <br>
> Shaozhong SHI <<a href="mailto:shishaozhong@gmail.com" target="_blank">shishaozhong@gmail.com</a>> ezt írta (időpont: 2020. máj. 11., H, 18:24):<br>
> Hello,<br>
> <br>
> I got a few questions as follows:<br>
> <br>
> 1. Which one is the best way for Fuzzy Address Matching?<br>
> <br>
> 2. FME FuzzyStringComparer uses Python difflib module. Which one in Postgres is equivalent or similar to it?<br>
> <br>
> 3. Often, addresses collected by different people may well be correct. But, there may be typing errors, or addresses are composed not in a consistent manner.<br>
> <br>
> For instance, South Great Avenue, A City, Planet Earth may be put down as the following:<br>
> <br>
> S. Great Aveue, City A, Earth Planet<br>
> Great Avene South, A City, Earth Planet<br>
> Great Avenue S, A City, Planet Earth<br>
> <br>
> Surely, there would be solutions to deal with this problem.<br>
> <br>
> Can anyone enlighten me?<br>
> <br>
> Regards,<br>
> <br>
> Shao<br>
> _______________________________________________<br>
> postgis-users mailing list<br>
> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
> _______________________________________________<br>
> postgis-users mailing list<br>
> <a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
> <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
<br>
_______________________________________________<br>
postgis-users mailing list<br>
<a href="mailto:postgis-users@lists.osgeo.org" target="_blank">postgis-users@lists.osgeo.org</a><br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users" rel="noreferrer" target="_blank">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></blockquote></div>