[postgis-tickets] [PostGIS] #1118: Street pretypes that aren't directional prefixes
PostGIS
trac at osgeo.org
Thu Apr 25 17:16:57 PDT 2013
#1118: Street pretypes that aren't directional prefixes
---------------------------------+------------------------------------------
Reporter: robe | Owner: robe
Type: defect | Status: new
Priority: medium | Milestone: PostGIS Future
Component: pagc_address_parser | Version: 1.5.X
Keywords: |
---------------------------------+------------------------------------------
Comment(by woodbri):
This is a problem of not standardizing the reference dataset and relying
on the existing standardization. This is a process bug, not a code bug. If
you take a random address and ask some people to standardize it into
components, you will surely get some different results because the people
will have a different set of rules in mind. So we take Tiger data which
has been standardized by 3300 different counties where it was collect and
given to Census and you will not even find consistency within Tiger. So
relying on the pre-parsed standardization is the wrong way to approach
this problem.
The way to fix this is to load the tiger data, then clump the name
attributes into a single string and give it to the standardizer to parse
and then save that. When we get a query request, we standardize that using
our same standardizer and rules and we match those results against our
standardized reference set.
Then we don't care if the standardization is right or wrong, because if it
is wrong, it will be wrong in both cases and will still match.
This process also has the benefit that you can analyze those records that
failed to standardize because of missing lexicon, gazeteer or rules and
add those that we might need to improve the tools over time. This part
can be done separate from the automated loading process. I should be done
as part of the bug fixing and enhancements to the geocoder over time.
While the pagc address standardizer improves things and proves some easy
tool to change the behavior if you don't make this process change. You
will have an endless list of bugs like this that have nothing to do with
the code. While you might be able to fix some of these with change to lex,
gaz and rules you also might be breaking other cases that are not obvious
when you make changes. DAMHIK.
I know the plan it to move forward without making this process change, but
it should be planned for sometime in the future.
--
Ticket URL: <http://trac.osgeo.org/postgis/ticket/1118#comment:8>
PostGIS <http://trac.osgeo.org/postgis/>
The PostGIS Trac is used for bug, enhancement & task tracking, a user and developer wiki, and a view into the subversion code repository of PostGIS project.
More information about the postgis-tickets
mailing list