[postgis-tickets] [PostGIS] #1614: More normalizer tricks
PostGIS
trac at osgeo.org
Mon Dec 10 12:26:57 PST 2012
#1614: More normalizer tricks
----------------------------+-----------------------------------------------
Reporter: mikepease | Owner: robe
Type: defect | Status: new
Priority: medium | Milestone: PostGIS 2.1.0
Component: tiger geocoder | Version: 1.5.X
Keywords: |
----------------------------+-----------------------------------------------
Comment(by woodbri):
When I loaded and standardized the all of Tiger for the whole US using the
PAGC standardizer. I looked at the records that failed to standardize so I
could add entries to the lexicon and gazeteer and parser rules. I found a
lot of garbage in these records. Things like you mention above COUNTY 20
RD vs COUNTY RD 20, and things like the street type in BOTH the name and
the type fields. I think there were about 9000 records out 50 Million, so
I have not waded through them yet as I had other higher priority items. I
also think some simple regex checking and cleaning of these in the loading
process is the best way to deal with them. Another words spend the time
once to deal with these, so the search code is cleaner, simpler and
faster.
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1614#comment:6>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list