[postgis-users] A more practical geocoder

Stephen Woodbridge woodbri at swoodbridge.com
Tue Nov 13 10:07:39 PST 2007


Dido on what Stephen Frost said.

-Steve W

Jason Horning wrote:
> 
> 
> To whom it may concern,
> 
>  
> 
> When we started to prototype our proof-of-concept web map using PostGIS 
> we needed a geocoder, but couldn't find one to fit our needs.  We did 
> try the Tiger Geocoder and attempted to use it but the data requirements 
> seemed excessive and additionally we needed to operate on the custom GIS 
> data being developed by state and local government’s GIS departments.  
> So, based on our fairly intimate understanding of how people do address 
> searches, we set out to prototype our own geocoder.  Our geocoder 
> functions somewhat differently from others we have worked with.
> 
>  
> 
> One notable difference, we rely more on pattern matching than upon 
> normalization in the source data.  For example, our geocoder does not 
> require road names in the roads table to be broken down into: prefix 
> direction, prefix type, street name, suffix type, and suffix direction.  
> We require only that a given road segment have a label. So, for example, 
> with other geocoders, a segment of " Main Street NorthEast " would be 
> attributed like so:  prefix direction = "", prefix type = "", street 
> name = "Main", suffix type = "none", suffix direction = "NorthEast".  
> Our geocoder allows the segment of road to be attributed with only: 
> label=" Main Street NorthEast ".  We use pattern matching techniques to 
> normalize that data when we create the geocoding indexes.  We then use 
> the same pattern matching techniques to normalize user input when 
> someone searches for " 123 Main Street NE ".  While this may not be 
> entirely revolutionary, we do get good matches.  We firmly believe that 
> simplifying the data model to allow the computer (instead of the GIS 
> analyst) to do the normalization is less error-prone and can have other 
> side benefits as well.  We perform interpolation along line segments in 
> regular fashion by having LeftFrom, RightFrom, LeftTo and RightTo.  
> Finally, we do not require zone information (Zip Left, Zip Right, 
> Community Left, Community Right, etc.) on the road segments themselves 
> and instead rely on the spatial relationship of a road segment to the 
> zone (polygon) it intersects or is within.
> 
>  
> 
> When we're looking to find and sort matches by relevance, multi-step 
> incremental scoring algorithm (matching the street name perfectly earns 
> a lot of points, getting a soundex match gets you a few, bonus points 
> for being in the correct postal zone, etc).  We use a number of 
> heuristics that have come out of our general experience.  Those 
> heuristics are evident in the code.
> 
>  
> 
> Essentially, we have approached geocoding as a natural language parsing 
> problem.  Our geocoder has been constructed based on the USPS postal 
> service standards, it would be possible to generalize it more to work 
> with other locales.  The key difference in the way we have approached 
> the problem is that the street name data is not normalized and as such 
> there is no requirement to break it up into components that only make 
> sense to a subset of localities.
> 
>  
> 
> While we understand that what we have is still essentially a prototype, 
> we are thinking that what we have done to date could be of some use to 
> the people who are focussed on mapping and geocoding projects and would 
> be happy to provide it for the inspection of others.  It is implemented 
> in fairly well-commented (and relatively standard) Perl, so we think the 
> code should be readable by most coders working in this area.
> 
>  
> 
> Jason Horning
> 
> BullBerry Systems, Inc.
> 
>  
> 
>  
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list