[postgis-users] Various ways to handle addresses in postgresql
Shaozhong SHI
shishaozhong at gmail.com
Sat Jan 9 07:24:47 PST 2021
Hi, Stephen,
Please send me the link to libpostal. I also need information on how to
instal it one PostGIS. I need the information to instruct our ICT staff,
so that they can make it ready.
Regards,
David
On Sat, 9 Jan 2021 at 15:04, Stephen Woodbridge <
stephenwoodbridge37 at gmail.com> wrote:
> Or use libpostal as Komяpa suggested and I’m sure there are others also.
> I’m just familiar with my own code and the fact that I built it to work
> inside a postgresql database.
>
> Sent from my iPhone
>
> On Jan 9, 2021, at 10:00 AM, Stephen Woodbridge <
> stephenwoodbridge37 at gmail.com> wrote:
>
> David,
>
> Yup and this is just one a dozens of cases that you have to deal with. You
> are dealing with a natural language processing problem. And you have to
> deal with human input that has typos and abbreviations.
>
> These issues are what the address standardizer fixes. It tokenized the
> address and uses the gazette to standardize the terms and then classifies
> each term and assigns it to part of the address based on a grammar.
>
> So there is a simple solution, use my address standardizer, it is free,
> MIT license, it has a sample lexicon/ gazette and grammar for the UK, it is
> easy to modify these to fit your needs, and it just works. Oh if you want
> to do another county it also has sample files for 25 countries.
>
> Sent from my iPhone
>
> On Jan 9, 2021, at 4:42 AM, Darafei Komяpa Praliaskouski <me at komzpa.net>
> wrote:
>
>
> Hello,
>
> People make neural networks for this kind of task:
>
> https://github.com/openvenues/libpostal
>
> сб, 9 сту 2021, 12:40 карыстальнік Shaozhong SHI <shishaozhong at gmail.com>
> напісаў:
>
>> Hi, Steve W,
>>
>> it is easy to parse addresses as tokens. But it is difficult to put
>> tokens in right columns, due to that the same address could be expressed
>> with partial address or full address.
>>
>> The same address can be written like, Flat 1 122 Great Avenue London UK,
>> or Flat 1 122 Greet Avenue Central London London United Kingdom.
>>
>> When this happens, each address has different number of tokens, so
>> different numbers of tokens. Is there a way to deal with this issue so
>> that each token can get into right column?
>>
>> Please enlighten me.
>>
>> Regards,
>>
>> David
>>
>> On Sat, 25 Apr 2020 at 05:09, Stephen Woodbridge <
>> stephenwoodbridge37 at gmail.com> wrote:
>>
>>> And I have create an address-standardizer project here
>>> https://github.com/woodbri/address-standardizer which is user
>>> configurable. I might be over kill is you just want to strip off the
>>> number, in which case you might just use a SQL regexp replace to remove
>>> it.
>>>
>>> -Steve W
>>>
>>> On 4/25/2020 12:04 AM, Stephen Woodbridge wrote:
>>> > PostGIS has address_standardizer extension that includes
>>> > parse_address() and standardize_address() functions.
>>> >
>>> > -Steve W
>>> >
>>> > On 4/24/2020 9:54 PM, Imre Samu wrote:
>>> >> > handle addresses in postgresql
>>> >>
>>> >> maybe you can use the https://github.com/openvenues/libpostal library
>>> >> with your favorite language bindings ( Python / Ruby / Go / PHP /
>>> >> Node / R / Java ...)
>>> >>
>>> >> or as a Postgres database extension:
>>> >>
>>> https://info.crunchydata.com/blog/quick-and-dirty-address-matching-with-libpostal
>>> >>
>>> >> https://github.com/pramsey/pgsql-postal
>>> >>
>>> >> Regards,
>>> >> Imre
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> Shaozhong SHI <shishaozhong at gmail.com
>>> >> <mailto:shishaozhong at gmail.com>> ezt írta (időpont: 2020. ápr. 25.,
>>> >> Szo, 2:49):
>>> >>
>>> >> I find this is a simple, but important question.
>>> >>
>>> >> How best to split numbers and the rest of address?
>>> >>
>>> >> For instance, one tricky one is as follows:
>>> >>
>>> >> 21-1 Great Avenue, a city, a country, this planet
>>> >>
>>> >> How to turn this into the following:
>>> >>
>>> >> column 1, column 2
>>> >>
>>> >> 21-1 Great Avenue, a city, a country, this planet
>>> >>
>>> >> Note: there is a hyphen in 21-1
>>> >>
>>> >> Any clue?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Shao
>>> >> _______________________________________________
>>> >> postgis-users mailing list
>>> >> postgis-users at lists.osgeo.org <mailto:
>>> postgis-users at lists.osgeo.org>
>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> postgis-users mailing list
>>> >> postgis-users at lists.osgeo.org
>>> >> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>> >
>>>
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20210109/8de3d983/attachment.html>
More information about the postgis-users
mailing list