[postgis-users] Various ways to handle addresses in postgresql
Shaozhong SHI
shishaozhong at gmail.com
Sat Jan 9 08:02:56 PST 2021
Hi, Steve,
Many thanks.
We are strongly interested in dealing with UK addresses.
Last time, I passed on messages for testing. I was told not working for UK
addresses.
I will try it again. Anything else do we have to watch out for?
Regards,
David
On Sat, 9 Jan 2021 at 15:53, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:
> David,
>
> This is the link to the address standardizer:
> https://github.com/woodbri/address-standardizer
>
> This is a link to all my code that I developed consulting. It includes a
> few SQL geocoders based on the code above. And has some README files
> discussing how to build a geocoder which is the basis for how the
> geocoders work.
>
> https://github.com/woodbri/imaptools.com
>
> this is the geocoder for Tiger data, but the code is essentially the
> same for every country because the when you load country specific data
> into the database it goes into its own table and then you standardize
> that data into stdstreets table and all queries are done against the
> stdstreets table and you only have to tweak the address range
> interpolation function which needs to access the source streets table
> for the geometry and house number ranges.
>
>
> https://github.com/woodbri/imaptools.com/blob/master/sql-scripts/geocoder/prep-tiger-geo-new.sql
>
> I would approach this by:
>
> 1. get the address standardizer compiled and installed. I can help if
> you run into problems or have questions.
> 2. load your UK street data into rawdata schema, ideally it would be
> best if we can create a table/view that presents this data as a single
> table where each record represents one side of the street and one
> jurisdiction this may mean that a single record in your source data will
> generate multiple records in this table/view (this greatly simplifies
> the coding and performance later)
> 3. look at the prep-tiger-geo-new.sql file
> 4. create a stdstreets table and standardize your table/view data into it
> 5. look at standardization failures and adjust lexicon and grammar as
> needed
> 6. loop back to 4 until good enough
> 7. load functions from prep-tiger-geo-new.sql file and adjust any for
> your data
> 8. try it out!
>
> -Steve
>
>
> On 1/9/2021 10:22 AM, Shaozhong SHI wrote:
> > Hi, Stephen,
> >
> > Many thanks. We are interested in it is working with the UK addresses.
> >
> > Please send me the link to this.
> >
> > Regards,
> >
> > David
> >
> > On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge
> > <stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>>
> > wrote:
> >
> > David,
> >
> > Yup and this is just one a dozens of cases that you have to deal
> > with. You are dealing with a natural language processing problem.
> > And you have to deal with human input that has typos and
> > abbreviations.
> >
> > These issues are what the address standardizer fixes. It tokenized
> > the address and uses the gazette to standardize the terms and then
> > classifies each term and assigns it to part of the address based
> > on a grammar.
> >
> > So there is a simple solution, use my address standardizer, it is
> > free, MIT license, it has a sample lexicon/ gazette and grammar
> > for the UK, it is easy to modify these to fit your needs, and it
> > just works. Oh if you want to do another county it also has sample
> > files for 25 countries.
> >
> > Sent from my iPhone
> >
> >> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski
> >> <me at komzpa.net <mailto:me at komzpa.net>> wrote:
> >>
> >>
> >> Hello,
> >>
> >> People make neural networks for this kind of task:
> >>
> >> https://github.com/openvenues/libpostal
> >> <https://github.com/openvenues/libpostal>
> >>
> >> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI
> >> <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>> напісаў:
> >>
> >> Hi, Steve W,
> >>
> >> it is easy to parse addresses as tokens. But it is difficult
> >> to put tokens in right columns, due to that the same address
> >> could be expressed with partial address or full address.
> >>
> >> The same address can be written like, Flat 1 122 Great Avenue
> >> London UK, or Flat 1 122 Greet Avenue Central London London
> >> United Kingdom.
> >>
> >> When this happens, each address has different number of
> >> tokens, so different numbers of tokens. Is there a way to
> >> deal with this issue so that each token can get into right
> >> column?
> >>
> >> Please enlighten me.
> >>
> >> Regards,
> >>
> >> David
> >>
> >> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge
> >> <stephenwoodbridge37 at gmail.com
> >> <mailto:stephenwoodbridge37 at gmail.com>> wrote:
> >>
> >> And I have create an address-standardizer project here
> >> https://github.com/woodbri/address-standardizer
> >> <https://github.com/woodbri/address-standardizer> which
> >> is user
> >> configurable. I might be over kill is you just want to
> >> strip off the
> >> number, in which case you might just use a SQL regexp
> >> replace to remove it.
> >>
> >> -Steve W
> >>
> >> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:
> >> > PostGIS has address_standardizer extension that includes
> >> > parse_address() and standardize_address() functions.
> >> >
> >> > -Steve W
> >> >
> >> > On 4/24/2020 9:54 PM, Imre Samu wrote:
> >> >> > handle addresses in postgresql
> >> >>
> >> >> maybe you can use the
> >> https://github.com/openvenues/libpostal
> >> <https://github.com/openvenues/libpostal> library
> >> >> with your favorite language bindings ( Python / Ruby /
> >> Go / PHP /
> >> >> Node / R / Java ...)
> >> >>
> >> >> or as a Postgres database extension:
> >> >>
> >>
> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal
> >> <
> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal
> >
> >>
> >> >>
> >> >> https://github.com/pramsey/pgsql-postal
> >> <https://github.com/pramsey/pgsql-postal>
> >> >>
> >> >> Regards,
> >> >> Imre
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> Shaozhong SHI <shishaozhong at gmail.com
> >> <mailto:shishaozhong at gmail.com>
> >> >> <mailto:shishaozhong at gmail.com
> >> <mailto:shishaozhong at gmail.com>>> ezt írta (időpont:
> >> 2020. ápr. 25.,
> >> >> Szo, 2:49):
> >> >>
> >> >> I find this is a simple, but important question.
> >> >>
> >> >> How best to split numbers and the rest of address?
> >> >>
> >> >> For instance, one tricky one is as follows:
> >> >>
> >> >> 21-1 Great Avenue, a city, a country, this planet
> >> >>
> >> >> How to turn this into the following:
> >> >>
> >> >> column 1, column 2
> >> >>
> >> >> 21-1 Great Avenue, a city, a
> >> country, this planet
> >> >>
> >> >> Note: there is a hyphen in 21-1
> >> >>
> >> >> Any clue?
> >> >>
> >> >> Regards,
> >> >>
> >> >> Shao
> >> >> _______________________________________________
> >> >> postgis-users mailing list
> >> >> postgis-users at lists.osgeo.org
> >> <mailto:postgis-users at lists.osgeo.org>
> >> <mailto:postgis-users at lists.osgeo.org
> >> <mailto:postgis-users at lists.osgeo.org>>
> >> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> postgis-users mailing list
> >> >> postgis-users at lists.osgeo.org
> >> <mailto:postgis-users at lists.osgeo.org>
> >> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >> >
> >>
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at lists.osgeo.org
> >> <mailto:postgis-users at lists.osgeo.org>
> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >>
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at lists.osgeo.org
> >> <mailto:postgis-users at lists.osgeo.org>
> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >>
> >> _______________________________________________
> >> postgis-users mailing list
> >> postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org
> >
> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
> >> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
> > <https://lists.osgeo.org/mailman/listinfo/postgis-users>
> >
> >
> > _______________________________________________
> > postgis-users mailing list
> > postgis-users at lists.osgeo.org
> > https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210109/b4fb13f9/attachment.html>
More information about the postgis-users
mailing list