[postgis-users] Various ways to handle addresses in postgresql
Shaozhong SHI
shishaozhong at gmail.com
Sun Jan 10 06:57:23 PST 2021
Hi, Steve,
Another solution appeals me most is as follows:
Given a space delimited full address line, we can parse it to correct
BS7666 format.
Something like house number, street, area, city, postcode
E.g., Nest 2 my nesting place 1B Great Avenue Forest Park London WS22 5TT
Can you enlighten me about that?
Regards
,
David
On Saturday, 9 January 2021, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:
> David,
>
> This is the link to the address standardizer:
> https://github.com/woodbri/address-standardizer
>
> This is a link to all my code that I developed consulting. It includes a
> few SQL geocoders based on the code above. And has some README files
> discussing how to build a geocoder which is the basis for how the geocoders
> work.
>
> https://github.com/woodbri/imaptools.com
>
> this is the geocoder for Tiger data, but the code is essentially the same
> for every country because the when you load country specific data into the
> database it goes into its own table and then you standardize that data into
> stdstreets table and all queries are done against the stdstreets table and
> you only have to tweak the address range interpolation function which needs
> to access the source streets table for the geometry and house number ranges.
>
> https://github.com/woodbri/imaptools.com/blob/master/sql-scr
> ipts/geocoder/prep-tiger-geo-new.sql
>
> I would approach this by:
>
> 1. get the address standardizer compiled and installed. I can help if you
> run into problems or have questions.
> 2. load your UK street data into rawdata schema, ideally it would be best
> if we can create a table/view that presents this data as a single table
> where each record represents one side of the street and one jurisdiction
> this may mean that a single record in your source data will generate
> multiple records in this table/view (this greatly simplifies the coding and
> performance later)
> 3. look at the prep-tiger-geo-new.sql file
> 4. create a stdstreets table and standardize your table/view data into it
> 5. look at standardization failures and adjust lexicon and grammar as
> needed
> 6. loop back to 4 until good enough
> 7. load functions from prep-tiger-geo-new.sql file and adjust any for your
> data
> 8. try it out!
>
> -Steve
>
>
> On 1/9/2021 10:22 AM, Shaozhong SHI wrote:
>
>> Hi, Stephen,
>>
>> Many thanks. We are interested in it is working with the UK addresses.
>>
>> Please send me the link to this.
>>
>> Regards,
>>
>> David
>>
>> On Sat, 9 Jan 2021 at 15:00, Stephen Woodbridge <
>> stephenwoodbridge37 at gmail.com <mailto:stephenwoodbridge37 at gmail.com>>
>> wrote:
>>
>> David,
>>
>> Yup and this is just one a dozens of cases that you have to deal
>> with. You are dealing with a natural language processing problem.
>> And you have to deal with human input that has typos and
>> abbreviations.
>>
>> These issues are what the address standardizer fixes. It tokenized
>> the address and uses the gazette to standardize the terms and then
>> classifies each term and assigns it to part of the address based
>> on a grammar.
>>
>> So there is a simple solution, use my address standardizer, it is
>> free, MIT license, it has a sample lexicon/ gazette and grammar
>> for the UK, it is easy to modify these to fit your needs, and it
>> just works. Oh if you want to do another county it also has sample
>> files for 25 countries.
>>
>> Sent from my iPhone
>>
>> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski
>>> <me at komzpa.net <mailto:me at komzpa.net>> wrote:
>>>
>>>
>>> Hello,
>>>
>>> People make neural networks for this kind of task:
>>>
>>> https://github.com/openvenues/libpostal
>>> <https://github.com/openvenues/libpostal>
>>>
>>> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI
>>> <shishaozhong at gmail.com <mailto:shishaozhong at gmail.com>> напісаў:
>>>
>>> Hi, Steve W,
>>>
>>> it is easy to parse addresses as tokens. But it is difficult
>>> to put tokens in right columns, due to that the same address
>>> could be expressed with partial address or full address.
>>>
>>> The same address can be written like, Flat 1 122 Great Avenue
>>> London UK, or Flat 1 122 Greet Avenue Central London London
>>> United Kingdom.
>>>
>>> When this happens, each address has different number of
>>> tokens, so different numbers of tokens. Is there a way to
>>> deal with this issue so that each token can get into right
>>> column?
>>>
>>> Please enlighten me.
>>>
>>> Regards,
>>>
>>> David
>>>
>>> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge
>>> <stephenwoodbridge37 at gmail.com
>>> <mailto:stephenwoodbridge37 at gmail.com>> wrote:
>>>
>>> And I have create an address-standardizer project here
>>> https://github.com/woodbri/address-standardizer
>>> <https://github.com/woodbri/address-standardizer> which
>>> is user
>>> configurable. I might be over kill is you just want to
>>> strip off the
>>> number, in which case you might just use a SQL regexp
>>> replace to remove it.
>>>
>>> -Steve W
>>>
>>> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:
>>> > PostGIS has address_standardizer extension that includes
>>> > parse_address() and standardize_address() functions.
>>> >
>>> > -Steve W
>>> >
>>> > On 4/24/2020 9:54 PM, Imre Samu wrote:
>>> >> > handle addresses in postgresql
>>> >>
>>> >> maybe you can use the
>>> https://github.com/openvenues/libpostal
>>> <https://github.com/openvenues/libpostal> library
>>> >> with your favorite language bindings ( Python / Ruby /
>>> Go / PHP /
>>> >> Node / R / Java ...)
>>> >>
>>> >> or as a Postgres database extension:
>>> >>
>>> https://info.crunchydata.com/blog/quick-and-dirty-address-ma
>>> tching-with-libpostal
>>> <https://info.crunchydata.com/blog/quick-and-dirty-address-m
>>> atching-with-libpostal>
>>>
>>> >>
>>> >> https://github.com/pramsey/pgsql-postal
>>> <https://github.com/pramsey/pgsql-postal>
>>> >>
>>> >> Regards,
>>> >> Imre
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Shaozhong SHI <shishaozhong at gmail.com
>>> <mailto:shishaozhong at gmail.com>
>>> >> <mailto:shishaozhong at gmail.com
>>> <mailto:shishaozhong at gmail.com>>> ezt írta (időpont:
>>> 2020. ápr. 25.,
>>> >> Szo, 2:49):
>>> >>
>>> >> I find this is a simple, but important question.
>>> >>
>>> >> How best to split numbers and the rest of address?
>>> >>
>>> >> For instance, one tricky one is as follows:
>>> >>
>>> >> 21-1 Great Avenue, a city, a country, this planet
>>> >>
>>> >> How to turn this into the following:
>>> >>
>>> >> column 1, column 2
>>> >>
>>> >> 21-1 Great Avenue, a city, a
>>> country, this planet
>>> >>
>>> >> Note: there is a hyphen in 21-1
>>> >>
>>> >> Any clue?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Shao
>>> >> _______________________________________________
>>> >> postgis-users mailing list
>>> >> postgis-users at lists.osgeo.org
>>> <mailto:postgis-users at lists.osgeo.org>
>>> <mailto:postgis-users at lists.osgeo.org
>>> <mailto:postgis-users at lists.osgeo.org>>
>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> postgis-users mailing list
>>> >> postgis-users at lists.osgeo.org
>>> <mailto:postgis-users at lists.osgeo.org>
>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>> >
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> <mailto:postgis-users at lists.osgeo.org>
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> <mailto:postgis-users at lists.osgeo.org>
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org <mailto:postgis-users at lists.osgeo.org>
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>> <https://lists.osgeo.org/mailman/listinfo/postgis-users>
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210110/f8572ce2/attachment.html>
More information about the postgis-users
mailing list