[PostGIS] #5695: standardize_address suspected of returning incorrect results
PostGIS
trac at osgeo.org
Thu Mar 14 20:12:57 PDT 2024
#5695: standardize_address suspected of returning incorrect results
---------------------+---------------------------
Reporter: tmcsys | Owner: pramsey
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 3.4.3
Component: postgis | Version: 3.4.x
Keywords: |
---------------------+---------------------------
standardize_address in PostGIS 3.4.1 returns different results than the
example on page 210 of PostGIS in Action, 2nd edition.
It fails to identify basic pre-directionals, instead including them in the
output as part of the house number.
The only pre-directional that appears to be correctly recognized by
standardize_address is ’S’ as ‘SOUTH’.
debug_standardize_address correctly identifies ‘E’ or ‘W’ as a directional
token.
{{{
# SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules', 'ONE E
PIMA ST STE 999, TUCSON, AZ');
┌──────────┬───────────┬────────┬──────┬─────────┬──────┬─────────┬────────┬────────────┬───────┬────────┬─────────┬─────────┬──────────┬─────┬───────────┐
│ building │ house_num │ predir │ qual │ pretype │ name │ suftype │ sufdir
│ ruralroute │ extra │ city │ state │ country │ postcode │ box │
unit │
├──────────┼───────────┼────────┼──────┼─────────┼──────┼─────────┼────────┼────────────┼───────┼────────┼─────────┼─────────┼──────────┼─────┼───────────┤
│ ¤ │ 1 E │ ¤ │ ¤ │ ¤ │ PIMA │ STREET │ ¤
│ ¤ │ ¤ │ TUCSON │ ARIZONA │ USA │ ¤ │ ¤ │ SUITE
999 │
└──────────┴───────────┴────────┴──────┴─────────┴──────┴─────────┴────────┴────────────┴───────┴────────┴─────────┴─────────┴──────────┴─────┴───────────┘
(1 row)
# SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules', '8907
W TROPICAIRE BLVD,NORTH PORT,FL, 34291');
┌──────────┬───────────┬────────┬──────┬─────────┬────────────┬───────────┬────────┬────────────┬───────┬────────────┬─────────┬─────────┬──────────┬─────┬──────┐
│ building │ house_num │ predir │ qual │ pretype │ name │ suftype
│ sufdir │ ruralroute │ extra │ city │ state │ country │ postcode
│ box │ unit │
├──────────┼───────────┼────────┼──────┼─────────┼────────────┼───────────┼────────┼────────────┼───────┼────────────┼─────────┼─────────┼──────────┼─────┼──────┤
│ ¤ │ 8907 W │ ¤ │ ¤ │ ¤ │ TROPICAIRE │ BOULEVARD
│ ¤ │ ¤ │ ¤ │ NORTH PORT │ FLORIDA │ USA │ 34291
│ ¤ │ ¤ │
└──────────┴───────────┴────────┴──────┴─────────┴────────────┴───────────┴────────┴────────────┴───────┴────────────┴─────────┴─────────┴──────────┴─────┴──────┘
(1 row)
# SELECT it->>'pos' AS position, it->>'word' AS word, it->>'stdword' AS
standardized_word,
[more]2-# it->>'token' AS token, it->>'token-code' AS token_code
[more]3-# FROM jsonb(
[more]3(# debug_standardize_address('us_lex', 'us_gaz',
'us_rules', '8907 W TROPICAIRE BLVD, NORTH PORT,FL, 34291')
[more]3(# ) AS s, jsonb_array_elements(s->'input_tokens') AS it;
┌──────────┬────────────┬───────────────────┬────────┬────────────┐
│ position │ word │ standardized_word │ token │ token_code │
├──────────┼────────────┼───────────────────┼────────┼────────────┤
│ 0 │ 8907 │ 8907 │ NUMBER │ 0 │
│ 0 │ 8907 │ 8907 │ QUAD │ 29 │
│ 1 │ W │ W │ SINGLE │ 18 │
│ 1 │ W │ WEST │ DIRECT │ 22 │
│ 2 │ TROPICAIRE │ TROPICAIRE │ WORD │ 1 │
│ 3 │ BLVD │ BOULEVARD │ TYPE │ 2 │
└──────────┴────────────┴───────────────────┴────────┴────────────┘
(6 rows)
# SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules', '8907
S TROPICAIRE BLVD,NORTH PORT,FL, 34291');
┌──────────┬───────────┬────────┬──────┬─────────┬────────────┬───────────┬────────┬────────────┬───────┬────────────┬─────────┬─────────┬──────────┬─────┬──────┐
│ building │ house_num │ predir │ qual │ pretype │ name │ suftype
│ sufdir │ ruralroute │ extra │ city │ state │ country │ postcode
│ box │ unit │
├──────────┼───────────┼────────┼──────┼─────────┼────────────┼───────────┼────────┼────────────┼───────┼────────────┼─────────┼─────────┼──────────┼─────┼──────┤
│ ¤ │ 8907 │ SOUTH │ ¤ │ ¤ │ TROPICAIRE │ BOULEVARD
│ ¤ │ ¤ │ ¤ │ NORTH PORT │ FLORIDA │ USA │ 34291
│ ¤ │ ¤ │
└──────────┴───────────┴────────┴──────┴─────────┴────────────┴───────────┴────────┴────────────┴───────┴────────────┴─────────┴─────────┴──────────┴─────┴──────┘
(1 row)
SELECT city, country
FROM parse_address('55 Laurier Avenue East, Room 3156, Ottawa ON K1N
6N5');
}}}
This potential defect was first noticed in PostgresAPP. You can read the
original issue at https://github.com/PostgresApp/PostgresApp/issues/756
For more details expand the “My output looks like this:” button in the
reply.
--
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5695>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list