[postgis-users] One line solution: select a dirty full address and get a clean address
Stephen Woodbridge
stephenwoodbridge37 at gmail.com
Sat Dec 19 16:24:22 PST 2020
Hi David,
Yes and No! you can do that for some simple addresses, but trying to
clean up addresses in general is much more complicated. I've been doing
this for 20+ years writing geocoders. Check out:
https://github.com/woodbri/address-standardizer
This will install as a postgresql extension and once you set it up it
has an easy function call that can be used to clean things up.
Read this to get a better idea:
https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md
And there is a docs/ directory with more documentation. I also provide
sample starter files for parsing addresses of about 25 countries, so you
can pick the one(s) you need and then customize them for your needs.
If you just want to try working with simple regex you probably want to read:
https://www.postgresqltutorial.com/regexp_replace/
|SELECT REGEXP_REPLACE( REGEXP_REPLACE||(string, |'[, .]+', ' '), -- convert comma, space, period to a single space
'^ *|--| *$', '') -- remove leading and trailing white space and '--'
);
-Steve W
On 12/19/2020 5:40 PM, Shaozhong SHI wrote:
> For instance,
>
> We may encounter,
>
> My Great Place. 101-103--a, Great Avenue, London, , SW3 5EE
>
> Can we do some thing like
>
> SELECT
> REGEXP_MATCHES(' My Great' Place. 101-103--a, Great Avenue,
> London, , SW3 5EE ',
> a_regular_expression,
> 'g');
>
> so that we end up a clean address line with one line of code?
>
> My Great Place 101-103a Great Avenue London SW3 5EE
>
> What regex is that?
>
> Can we guarantee a single space between words?
>
> Regards,
>
> David
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list