[postgis-users] One line solution: select a dirty full address and get a clean address
Stephen Woodbridge
stephenwoodbridge37 at gmail.com
Sat Dec 19 16:30:26 PST 2020
On 12/19/2020 7:24 PM, Stephen Woodbridge wrote:
> Hi David,
>
> Yes and No! you can do that for some simple addresses, but trying to
> clean up addresses in general is much more complicated. I've been
> doing this for 20+ years writing geocoders. Check out:
>
> https://github.com/woodbri/address-standardizer
>
> This will install as a postgresql extension and once you set it up it
> has an easy function call that can be used to clean things up.
> Read this to get a better idea:
>
> https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md
>
>
> And there is a docs/ directory with more documentation. I also provide
> sample starter files for parsing addresses of about 25 countries, so
> you can pick the one(s) you need and then customize them for your needs.
>
> If you just want to try working with simple regex you probably want to
> read:
>
> https://www.postgresqltutorial.com/regexp_replace/
>
> |SELECT REGEXP_REPLACE( REGEXP_REPLACE||(string, |'[, .]+', '
> '), -- convert comma, space, period to a single space
> '^ *|--| *$', '') -- remove leading and trailing
> white space and '--'
> );
But this will not handle "101 - 103 -- a" without adding more clauses.
And how do you plan to handle things like abbreviations? Ave, ave,
avenue, st, street, saint, or lots of other issues. A lot depends on
your use case, are you geocoding these are just trying to clean up some
text.
>
>
> -Steve W
>
> On 12/19/2020 5:40 PM, Shaozhong SHI wrote:
>> For instance,
>>
>> We may encounter,
>>
>> My Great Place. 101-103--a, Great Avenue, London, , SW3 5EE
>>
>> Can we do some thing like
>>
>> SELECT
>> REGEXP_MATCHES(' My Great' Place. 101-103--a, Great Avenue,
>> London, , SW3 5EE ',
>> a_regular_expression,
>> 'g');
>>
>> so that we end up a clean address line with one line of code?
>>
>> My Great Place 101-103a Great Avenue London SW3 5EE
>>
>> What regex is that?
>>
>> Can we guarantee a single space between words?
>>
>> Regards,
>>
>> David
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
More information about the postgis-users
mailing list