[postgis-users] A more practical geocoder
Stephen Woodbridge
woodbri at swoodbridge.com
Tue Nov 13 10:07:39 PST 2007
Dido on what Stephen Frost said.
-Steve W
Jason Horning wrote:
>
>
> To whom it may concern,
>
>
>
> When we started to prototype our proof-of-concept web map using PostGIS
> we needed a geocoder, but couldn't find one to fit our needs. We did
> try the Tiger Geocoder and attempted to use it but the data requirements
> seemed excessive and additionally we needed to operate on the custom GIS
> data being developed by state and local government’s GIS departments.
> So, based on our fairly intimate understanding of how people do address
> searches, we set out to prototype our own geocoder. Our geocoder
> functions somewhat differently from others we have worked with.
>
>
>
> One notable difference, we rely more on pattern matching than upon
> normalization in the source data. For example, our geocoder does not
> require road names in the roads table to be broken down into: prefix
> direction, prefix type, street name, suffix type, and suffix direction.
> We require only that a given road segment have a label. So, for example,
> with other geocoders, a segment of " Main Street NorthEast " would be
> attributed like so: prefix direction = "", prefix type = "", street
> name = "Main", suffix type = "none", suffix direction = "NorthEast".
> Our geocoder allows the segment of road to be attributed with only:
> label=" Main Street NorthEast ". We use pattern matching techniques to
> normalize that data when we create the geocoding indexes. We then use
> the same pattern matching techniques to normalize user input when
> someone searches for " 123 Main Street NE ". While this may not be
> entirely revolutionary, we do get good matches. We firmly believe that
> simplifying the data model to allow the computer (instead of the GIS
> analyst) to do the normalization is less error-prone and can have other
> side benefits as well. We perform interpolation along line segments in
> regular fashion by having LeftFrom, RightFrom, LeftTo and RightTo.
> Finally, we do not require zone information (Zip Left, Zip Right,
> Community Left, Community Right, etc.) on the road segments themselves
> and instead rely on the spatial relationship of a road segment to the
> zone (polygon) it intersects or is within.
>
>
>
> When we're looking to find and sort matches by relevance, multi-step
> incremental scoring algorithm (matching the street name perfectly earns
> a lot of points, getting a soundex match gets you a few, bonus points
> for being in the correct postal zone, etc). We use a number of
> heuristics that have come out of our general experience. Those
> heuristics are evident in the code.
>
>
>
> Essentially, we have approached geocoding as a natural language parsing
> problem. Our geocoder has been constructed based on the USPS postal
> service standards, it would be possible to generalize it more to work
> with other locales. The key difference in the way we have approached
> the problem is that the street name data is not normalized and as such
> there is no requirement to break it up into components that only make
> sense to a subset of localities.
>
>
>
> While we understand that what we have is still essentially a prototype,
> we are thinking that what we have done to date could be of some use to
> the people who are focussed on mapping and geocoding projects and would
> be happy to provide it for the inspection of others. It is implemented
> in fairly well-commented (and relatively standard) Perl, so we think the
> code should be readable by most coders working in this area.
>
>
>
> Jason Horning
>
> BullBerry Systems, Inc.
>
>
>
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
More information about the postgis-users
mailing list