[postgis-tickets] [PostGIS] #1614: More normalizer tricks

PostGIS trac at osgeo.org
Mon Dec 10 12:26:57 PST 2012


#1614: More normalizer tricks
----------------------------+-----------------------------------------------
 Reporter:  mikepease       |       Owner:  robe         
     Type:  defect          |      Status:  new          
 Priority:  medium          |   Milestone:  PostGIS 2.1.0
Component:  tiger geocoder  |     Version:  1.5.X        
 Keywords:                  |  
----------------------------+-----------------------------------------------

Comment(by woodbri):

 When I loaded and standardized the all of Tiger for the whole US using the
 PAGC standardizer. I looked at the records that failed to standardize so I
 could add entries to the lexicon and gazeteer and parser rules. I found a
 lot of garbage in these records. Things like you mention above COUNTY 20
 RD vs COUNTY RD 20, and things like the street type in BOTH the name and
 the type fields. I think there were about 9000 records out 50 Million, so
 I have not waded through them yet as I had other higher priority items. I
 also think some simple regex checking and cleaning of these in the loading
 process is the best way to deal with them. Another words spend the time
 once to deal with these, so the search code is cleaner, simpler and
 faster.

-- 
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1614#comment:6>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.


More information about the postgis-tickets mailing list