[postgis-users] Fuzzy Address Matching - PostgreSql equivalent to FuzzyStringComparer using Python difflib module

Paul Ramsey pramsey at cleverelephant.ca
Mon May 11 09:30:00 PDT 2020


It's not an easy problem. There is no one guaranteed magic bullet.

Use the address_standardizer extension, particularly for north american addressing.

  https://postgis.net/docs/postgis_installation.html#installing_pagc_address_standardizer

Or use an ML trained standardizer like this one.

  https://github.com/pramsey/pgsql-postal

Or gate out to a geocoding service using a web service call.

  https://docs.google.com/presentation/d/1Fgc_2dzWAzT--HdMEiWj2fFLJNnpxPXmnYXx9Js3xjE/edit

To handball some fuzzy stuff, use the functions in the postgresql contrib module, 

  create extension fuzzystrmatch;

The python utility is really just using different ratios of string length and levenstein distance, it ain't rocket science.

P.


> On May 11, 2020, at 9:24 AM, Shaozhong SHI <shishaozhong at gmail.com> wrote:
> 
> Hello,
> 
> I got a few questions as follows:
> 
> 1.  Which one is the best way for Fuzzy Address Matching?
> 
> 2.  FME FuzzyStringComparer uses  Python difflib module.  Which one in Postgres is equivalent or similar to it?
> 
> 3.  Often, addresses collected by different people may well be correct.  But, there may be typing errors, or addresses are composed not in a consistent manner.
> 
> For instance, South Great Avenue, A City, Planet Earth may be put down as the following:
> 
> S. Great Aveue, City A, Earth Planet
> Great Avene South, A City, Earth Planet
> Great Avenue S, A City, Planet Earth
> 
> Surely, there would be solutions to deal with this problem.
> 
> Can anyone enlighten me?
> 
> Regards,
> 
> Shao
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users



More information about the postgis-users mailing list