[postgis-users] One line solution: select a dirty full address and get a clean address

Stephen Woodbridge stephenwoodbridge37 at gmail.com
Sat Dec 19 16:24:22 PST 2020


Hi David,

Yes and No! you can do that for some simple addresses, but trying to 
clean up addresses in general is much more complicated. I've been doing 
this for 20+ years writing geocoders. Check out:

https://github.com/woodbri/address-standardizer

This will install as a postgresql extension and once you set it up it 
has an easy function call that can be used to clean things up.
Read this to get a better idea:

https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md

And there is a docs/ directory with more documentation. I also provide 
sample starter files for parsing addresses of about 25 countries, so you 
can pick the one(s) you need and then customize them for your needs.

If you just want to try working with simple regex you probably want to read:

https://www.postgresqltutorial.com/regexp_replace/

|SELECT REGEXP_REPLACE( REGEXP_REPLACE||(string, |'[, .]+', ' '),       -- convert comma, space, period to a single space
                 '^ *|--| *$', '')     -- remove leading and trailing white space and '--'
        );


-Steve W

On 12/19/2020 5:40 PM, Shaozhong SHI wrote:
> For instance,
>
> We may encounter,
>
> My Great Place. 101-103--a, Great Avenue, London, , SW3 5EE
>
> Can we do some thing like
>
> SELECT
>     REGEXP_MATCHES(' My Great' Place. 101-103--a, Great Avenue, 
> London, , SW3 5EE ',
>          a_regular_expression,
>         'g');
>
> so that we end up a clean address line with one line of code?
>
> My Great Place 101-103a Great Avenue London SW3 5EE
>
> What regex is that?
>
> Can we guarantee a single space between words?
>
> Regards,
>
> David
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users



More information about the postgis-users mailing list