[PostGIS] #5695: standardize_address suspected of returning incorrect results

PostGIS trac at osgeo.org
Thu Mar 14 20:12:57 PDT 2024


#5695: standardize_address suspected of returning incorrect results
---------------------+---------------------------
 Reporter:  tmcsys   |      Owner:  pramsey
     Type:  defect   |     Status:  new
 Priority:  medium   |  Milestone:  PostGIS 3.4.3
Component:  postgis  |    Version:  3.4.x
 Keywords:           |
---------------------+---------------------------
 standardize_address in PostGIS 3.4.1 returns different results than the
 example on page 210 of PostGIS in Action, 2nd edition.

 It fails to identify basic pre-directionals, instead including them in the
 output as part of the house number.

 The only pre-directional that appears to be correctly recognized by
 standardize_address is ’S’ as ‘SOUTH’.

 debug_standardize_address correctly identifies ‘E’ or ‘W’ as a directional
 token.

 {{{
 # SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules', 'ONE E
 PIMA ST STE 999, TUCSON, AZ');
 ┌──────────┬───────────┬────────┬──────┬─────────┬──────┬─────────┬────────┬────────────┬───────┬────────┬─────────┬─────────┬──────────┬─────┬───────────┐
 │ building │ house_num │ predir │ qual │ pretype │ name │ suftype │ sufdir
 │ ruralroute │ extra │  city  │  state  │ country │ postcode │ box │
 unit    │
 ├──────────┼───────────┼────────┼──────┼─────────┼──────┼─────────┼────────┼────────────┼───────┼────────┼─────────┼─────────┼──────────┼─────┼───────────┤
 │ ¤        │ 1 E       │ ¤      │ ¤    │ ¤       │ PIMA │ STREET  │ ¤
 │ ¤          │ ¤     │ TUCSON │ ARIZONA │ USA     │ ¤        │ ¤   │ SUITE
 999 │
 └──────────┴───────────┴────────┴──────┴─────────┴──────┴─────────┴────────┴────────────┴───────┴────────┴─────────┴─────────┴──────────┴─────┴───────────┘
 (1 row)

 # SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules', '8907
 W TROPICAIRE BLVD,NORTH PORT,FL, 34291');
 ┌──────────┬───────────┬────────┬──────┬─────────┬────────────┬───────────┬────────┬────────────┬───────┬────────────┬─────────┬─────────┬──────────┬─────┬──────┐
 │ building │ house_num │ predir │ qual │ pretype │    name    │  suftype
 │ sufdir │ ruralroute │ extra │    city    │  state  │ country │ postcode
 │ box │ unit │
 ├──────────┼───────────┼────────┼──────┼─────────┼────────────┼───────────┼────────┼────────────┼───────┼────────────┼─────────┼─────────┼──────────┼─────┼──────┤
 │ ¤        │ 8907 W    │ ¤      │ ¤    │ ¤       │ TROPICAIRE │ BOULEVARD
 │ ¤      │ ¤          │ ¤     │ NORTH PORT │ FLORIDA │ USA     │ 34291
 │ ¤   │ ¤    │
 └──────────┴───────────┴────────┴──────┴─────────┴────────────┴───────────┴────────┴────────────┴───────┴────────────┴─────────┴─────────┴──────────┴─────┴──────┘
 (1 row)

 # SELECT it->>'pos' AS position, it->>'word' AS word, it->>'stdword' AS
 standardized_word,
 [more]2-#          it->>'token' AS token, it->>'token-code' AS token_code
 [more]3-#     FROM jsonb(
 [more]3(#          debug_standardize_address('us_lex', 'us_gaz',
 'us_rules', '8907 W TROPICAIRE BLVD, NORTH PORT,FL, 34291')
 [more]3(#     ) AS s, jsonb_array_elements(s->'input_tokens') AS it;
 ┌──────────┬────────────┬───────────────────┬────────┬────────────┐
 │ position │    word    │ standardized_word │ token  │ token_code │
 ├──────────┼────────────┼───────────────────┼────────┼────────────┤
 │ 0        │ 8907       │ 8907              │ NUMBER │ 0          │
 │ 0        │ 8907       │ 8907              │ QUAD   │ 29         │
 │ 1        │ W          │ W                 │ SINGLE │ 18         │
 │ 1        │ W          │ WEST              │ DIRECT │ 22         │
 │ 2        │ TROPICAIRE │ TROPICAIRE        │ WORD   │ 1          │
 │ 3        │ BLVD       │ BOULEVARD         │ TYPE   │ 2          │
 └──────────┴────────────┴───────────────────┴────────┴────────────┘
 (6 rows)

 # SELECT * FROM standardize_address('us_lex', 'us_gaz', 'us_rules', '8907
 S TROPICAIRE BLVD,NORTH PORT,FL, 34291');
 ┌──────────┬───────────┬────────┬──────┬─────────┬────────────┬───────────┬────────┬────────────┬───────┬────────────┬─────────┬─────────┬──────────┬─────┬──────┐
 │ building │ house_num │ predir │ qual │ pretype │    name    │  suftype
 │ sufdir │ ruralroute │ extra │    city    │  state  │ country │ postcode
 │ box │ unit │
 ├──────────┼───────────┼────────┼──────┼─────────┼────────────┼───────────┼────────┼────────────┼───────┼────────────┼─────────┼─────────┼──────────┼─────┼──────┤
 │ ¤        │ 8907      │ SOUTH  │ ¤    │ ¤       │ TROPICAIRE │ BOULEVARD
 │ ¤      │ ¤          │ ¤     │ NORTH PORT │ FLORIDA │ USA     │ 34291
 │ ¤   │ ¤    │
 └──────────┴───────────┴────────┴──────┴─────────┴────────────┴───────────┴────────┴────────────┴───────┴────────────┴─────────┴─────────┴──────────┴─────┴──────┘
 (1 row)
 SELECT city, country
 FROM parse_address('55 Laurier Avenue East, Room 3156, Ottawa ON K1N
 6N5');
 }}}

 This potential defect was first noticed in PostgresAPP. You can read the
 original issue at https://github.com/PostgresApp/PostgresApp/issues/756
 For more details expand the “My output looks like this:” button in the
 reply.
-- 
Ticket URL: <https://trac.osgeo.org/postgis/ticket/5695>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list