[postgis-users] One line solution: select a dirty full address and get a clean address

Stephen Woodbridge stephenwoodbridge37 at gmail.com
Sat Dec 19 16:30:26 PST 2020


On 12/19/2020 7:24 PM, Stephen Woodbridge wrote:
> Hi David,
>
> Yes and No! you can do that for some simple addresses, but trying to 
> clean up addresses in general is much more complicated. I've been 
> doing this for 20+ years writing geocoders. Check out:
>
> https://github.com/woodbri/address-standardizer
>
> This will install as a postgresql extension and once you set it up it 
> has an easy function call that can be used to clean things up.
> Read this to get a better idea:
>
> https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md 
>
>
> And there is a docs/ directory with more documentation. I also provide 
> sample starter files for parsing addresses of about 25 countries, so 
> you can pick the one(s) you need and then customize them for your needs.
>
> If you just want to try working with simple regex you probably want to 
> read:
>
> https://www.postgresqltutorial.com/regexp_replace/
>
> |SELECT REGEXP_REPLACE( REGEXP_REPLACE||(string, |'[, .]+', ' 
> '),       -- convert comma, space, period to a single space
>                 '^ *|--| *$', '')     -- remove leading and trailing 
> white space and '--'
>        );

But this will not handle "101 - 103 -- a" without adding more clauses. 
And how do you plan to handle things like abbreviations? Ave, ave, 
avenue, st, street, saint, or lots of other issues. A lot depends on 
your use case, are you geocoding these are just trying to clean up some 
text.
>
>
> -Steve W
>
> On 12/19/2020 5:40 PM, Shaozhong SHI wrote:
>> For instance,
>>
>> We may encounter,
>>
>> My Great Place. 101-103--a, Great Avenue, London, , SW3 5EE
>>
>> Can we do some thing like
>>
>> SELECT
>>     REGEXP_MATCHES(' My Great' Place. 101-103--a, Great Avenue, 
>> London, , SW3 5EE ',
>>          a_regular_expression,
>>         'g');
>>
>> so that we end up a clean address line with one line of code?
>>
>> My Great Place 101-103a Great Avenue London SW3 5EE
>>
>> What regex is that?
>>
>> Can we guarantee a single space between words?
>>
>> Regards,
>>
>> David
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>



More information about the postgis-users mailing list