[postgis-users] One line solution: select a dirty full address and get a clean address

Shaozhong SHI shishaozhong at gmail.com
Mon Dec 21 08:40:06 PST 2020


Hi,  Steve W,

Many thanks,

How best to see where two addresses match.

For instance, one would say:  101d--120a Some Great Street and 104d-110d
Some Great Street match.

How best to do it?

Would it be possible to turn both ranges into arrays, and then intersect
the range to check out?

Please enlighten me.

Regards,

David

On Sun, 20 Dec 2020 at 00:30, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:

> On 12/19/2020 7:24 PM, Stephen Woodbridge wrote:
> > Hi David,
> >
> > Yes and No! you can do that for some simple addresses, but trying to
> > clean up addresses in general is much more complicated. I've been
> > doing this for 20+ years writing geocoders. Check out:
> >
> > https://github.com/woodbri/address-standardizer
> >
> > This will install as a postgresql extension and once you set it up it
> > has an easy function call that can be used to clean things up.
> > Read this to get a better idea:
> >
> >
> https://github.com/woodbri/address-standardizer/blob/develop/DOCUMENTATION.md
> >
> >
> > And there is a docs/ directory with more documentation. I also provide
> > sample starter files for parsing addresses of about 25 countries, so
> > you can pick the one(s) you need and then customize them for your needs.
> >
> > If you just want to try working with simple regex you probably want to
> > read:
> >
> > https://www.postgresqltutorial.com/regexp_replace/
> >
> > |SELECT REGEXP_REPLACE( REGEXP_REPLACE||(string, |'[, .]+', '
> > '),       -- convert comma, space, period to a single space
> >                 '^ *|--| *$', '')     -- remove leading and trailing
> > white space and '--'
> >        );
>
> But this will not handle "101 - 103 -- a" without adding more clauses.
> And how do you plan to handle things like abbreviations? Ave, ave,
> avenue, st, street, saint, or lots of other issues. A lot depends on
> your use case, are you geocoding these are just trying to clean up some
> text.
> >
> >
> > -Steve W
> >
> > On 12/19/2020 5:40 PM, Shaozhong SHI wrote:
> >> For instance,
> >>
> >> We may encounter,
> >>
> >> My Great Place. 101-103--a, Great Avenue, London, , SW3 5EE
> >>
> >> Can we do some thing like
> >>
> >> SELECT
> >>     REGEXP_MATCHES(' My Great' Place. 101-103--a, Great Avenue,
> >> London, , SW3 5EE ',
> >>          a_regular_expression,
> >>         'g');
> >>
> >> so that we end up a clean address line with one line of code?
> >>
> >> My Great Place 101-103a Great Avenue London SW3 5EE
> >>
> >> What regex is that?
> >>
> >> Can we guarantee a single space between words?
> >>
> >> Regards,
> >>
> >> David
> >>
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at lists.osgeo.org
> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20201221/28373e93/attachment.html>


More information about the postgis-users mailing list