[postgis-tickets] [PostGIS] #1118: Street pretypes that aren't directional prefixes

PostGIS trac at osgeo.org
Thu Apr 25 17:16:57 PDT 2013


#1118: Street pretypes that aren't directional prefixes
---------------------------------+------------------------------------------
 Reporter:  robe                 |       Owner:  robe          
     Type:  defect               |      Status:  new           
 Priority:  medium               |   Milestone:  PostGIS Future
Component:  pagc_address_parser  |     Version:  1.5.X         
 Keywords:                       |  
---------------------------------+------------------------------------------

Comment(by woodbri):

 This is a problem of not standardizing the reference dataset and relying
 on the existing standardization. This is a process bug, not a code bug. If
 you take a random address and ask some people to standardize it into
 components, you will surely get some different results because the people
 will have a different set of rules in mind. So we take Tiger data which
 has been standardized by 3300 different counties where it was collect and
 given to Census and you will not even find consistency within Tiger. So
 relying on the pre-parsed standardization is the wrong way to approach
 this problem.

 The way to fix this is to load the tiger data, then clump the name
 attributes into a single string and give it to the standardizer to parse
 and then save that. When we get a query request, we standardize that using
 our same standardizer and rules and we match those results against our
 standardized reference set.

 Then we don't care if the standardization is right or wrong, because if it
 is wrong, it will be wrong in both cases and will still match.

 This process also has the benefit that you can analyze those records that
 failed to standardize because of missing lexicon, gazeteer or rules and
 add those that we might need to improve the  tools over time. This part
 can be done separate from the automated loading process. I should be done
 as part of the bug fixing and enhancements to the geocoder over time.

 While the pagc address standardizer improves things and proves some easy
 tool to change the behavior if you don't make this process change. You
 will have an endless list of bugs like this that have nothing to do with
 the code. While you might be able to fix some of these with change to lex,
 gaz and rules you also might be breaking other cases that are not obvious
 when you make changes. DAMHIK.

 I know the plan it to move forward without making this process change, but
 it should be planned for sometime in the future.

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1118#comment:8>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list