[mapserver-users] Flexible queries (to support address lookups)

Steve Lime Steve.Lime at dnr.state.mn.us
Thu Sep 18 12:03:48 EDT 2008


Like I said, there are other users with far more knowledge in this area... ;-)

Steve

>>> On 9/17/2008 at 2:22 PM, in message <48D158FA.7040300 at swoodbridge.com>, Stephen
Woodbridge <woodbri at swoodbridge.com> wrote:
> This is one approach to the problem, but it does not deal with the real 
> problems of matching user entered addresses with addresses encoded on 
> street segments.
> 
> For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44, 
> Highway 44, State Highway 44, Rt 44, and various other abbreviations for 
> Highway, simple typo errors, adding N, N., North, S, S., South, etc 
> designations to the Highway, adding Alt., Bus., Byp., etc and on it 
> goes. You also need to deal with accented characters, that are sometimes 
> entered without accents.
> 
> In a geocoder, you typically have a standardizer that sort our all that 
> craziness. Then when you load the geocoder, you standardize the vendor 
> data and store it in a standard form. When you get a geocode request you 
> standardize the incoming request and then try to match the standard form 
> with the vendor data which is also in standard form.
> 
> You can also you techniques like metaphone/soundex codes to do fuzzy 
> searching and then use levensthein distance to score the possible 
> matched results for how close they are to the request.
> 
> You need to be prepared to handle multiple results to a query, for 
> example you search for Oak St. but only find North Oak Street and South 
> Oak Street.
> 
> Also what are you going to search? your whole dataset, or are you also 
> going to want to filter it by City, state, postal code, country. I thins 
> case you also need to be able to parse the full address into al these 
> additional terms. and filter your search to those appropriate to that 
> limited region.
> 
> It makes much more sense, to load the appropriate data records into a 
> relational database and make the queries in SQL. If you do not want to 
> use a full blown database like Postgresql or Mysql, then look at SQLite 
> which is a wonderful embedded database with zero management and had 
> binding for C, Perl, Python, PHP, TCL, etc.
> 
> My two cents, from someone that has his head and fingers in too many 
> geocoders.
> 
> -Steve
>   http://imaptools.com/ 
> 
> Steve Lime wrote:
>> I think we'd need fuzzy match operator, probably one specific to address 
>> matching. This would involve adding a C function(s) to compare to addresses
>> strings and then tweaking the MapServer yacc grammar to recognize the
>> new operator. The trick would obviously to write the C function and there 
> are
>> folks on the list with considerable experience with that problem.
>> 
>> If you HAD that operator then presumably you could write different filters
>> depending on your data, e.g.:
>> 
>>   ('user entered address' addreq '[address column]') or
>>   ('user entered address' addreq '[prefix] [name] [type] [suffix]')
>> 
>> That would be faster than trying to manipulate the current operators. You 
> could
>> also do a very generic query, like a case insensitive lookup on the street 
> name
>> and then operate on that result set in your application to deal with data
>> differences.
>> 
>> Steve
>> 
>>>>> "Emerson, Gabe" <gemerson at WelshCo.com> 09/17/08 10:16 AM >>>
>> Hi All,
>> 
>> I have an interesting mini-project which some of you might have dealt
>> with before, I'd be interested in any suggestions. 
>> 
>> I'd like to run a query (presented to users as an address search),
>> across multiple layers. For example, after an address is entered, the
>> system first searches an in-house dataset, if there are no matches it
>> searches a county parcel dataset, and if both fail, it tries to map the
>> address via a geocoding API.
>>  
>> The issue I'm running into is that each of the layers stores addresses a
>> little differently. The in-house set tends to be sloppy about
>> punctuation and things like directions ('N' vs 'North', 'St' vs 'ST.' vs
>> 'Street',  etc).  The county is more standardized but breaks everything
>> up into street prefix, name, type, suffix, etc. (Minnesota Met Council,
>> for those of you familiar with it). In addition, users tend not to enter
>> addresses the same way twice, and to leave out things like the street
>> type and direction.
>> 
>> I'm wondering if there's a way to relax the query matches so that
>> something like "100 James" will return a match from a DBF containing
>> "100 South James Ave", or a set of columns like "100" "S" "James" "Ave".
>> Something along the lines of  The Geocoding API is flexible in this way,
>> so one solution I considered is to use it as an address parser and then
>> use  the returned X,Y data for an itemquery on each of the layers. The
>> problems with that are slower performance  and possible API
>> unavailability. 
>> 
>> Currently I'm using Mapserver in CGI mode with some Javascript for
>> frontend logic and custom tools. I developed the application this way
>> for various reasons, but am considering moving to PHP Mapscript for
>> better performance.  If something like this is possible with the CGI
>> approach I'd love to hear about it, but I'd also be interested in
>> mapscript ideas or examples. 
>> 
>> Thanks!
>> 
>> -Gabe
>> 
>> Gabe Emerson
>> Research Department
>> Welsh Companies
>> 4350 Baker Road, Suite 400
>> Minnetonka, MN 55343-8695
>> 952-897-7700, ext. 1306
>> gemerson at welshco.com 
>> 
>> _______________________________________________
>> mapserver-users mailing list
>> mapserver-users at lists.osgeo.org 
>> http://lists.osgeo.org/mailman/listinfo/mapserver-users


More information about the mapserver-users mailing list