[mapserver-users] Flexible queries (to support address lookups)
Stephen Woodbridge
woodbri at swoodbridge.com
Wed Sep 17 12:22:34 PDT 2008
This is one approach to the problem, but it does not deal with the real
problems of matching user entered addresses with addresses encoded on
street segments.
For example: matching AL 44, Alabama 44, AL-44, Alabama Highway 44,
Highway 44, State Highway 44, Rt 44, and various other abbreviations for
Highway, simple typo errors, adding N, N., North, S, S., South, etc
designations to the Highway, adding Alt., Bus., Byp., etc and on it
goes. You also need to deal with accented characters, that are sometimes
entered without accents.
In a geocoder, you typically have a standardizer that sort our all that
craziness. Then when you load the geocoder, you standardize the vendor
data and store it in a standard form. When you get a geocode request you
standardize the incoming request and then try to match the standard form
with the vendor data which is also in standard form.
You can also you techniques like metaphone/soundex codes to do fuzzy
searching and then use levensthein distance to score the possible
matched results for how close they are to the request.
You need to be prepared to handle multiple results to a query, for
example you search for Oak St. but only find North Oak Street and South
Oak Street.
Also what are you going to search? your whole dataset, or are you also
going to want to filter it by City, state, postal code, country. I thins
case you also need to be able to parse the full address into al these
additional terms. and filter your search to those appropriate to that
limited region.
It makes much more sense, to load the appropriate data records into a
relational database and make the queries in SQL. If you do not want to
use a full blown database like Postgresql or Mysql, then look at SQLite
which is a wonderful embedded database with zero management and had
binding for C, Perl, Python, PHP, TCL, etc.
My two cents, from someone that has his head and fingers in too many
geocoders.
-Steve
http://imaptools.com/
Steve Lime wrote:
> I think we'd need fuzzy match operator, probably one specific to address
> matching. This would involve adding a C function(s) to compare to addresses
> strings and then tweaking the MapServer yacc grammar to recognize the
> new operator. The trick would obviously to write the C function and there are
> folks on the list with considerable experience with that problem.
>
> If you HAD that operator then presumably you could write different filters
> depending on your data, e.g.:
>
> ('user entered address' addreq '[address column]') or
> ('user entered address' addreq '[prefix] [name] [type] [suffix]')
>
> That would be faster than trying to manipulate the current operators. You could
> also do a very generic query, like a case insensitive lookup on the street name
> and then operate on that result set in your application to deal with data
> differences.
>
> Steve
>
>>>> "Emerson, Gabe" <gemerson at WelshCo.com> 09/17/08 10:16 AM >>>
> Hi All,
>
> I have an interesting mini-project which some of you might have dealt
> with before, I'd be interested in any suggestions.
>
> I'd like to run a query (presented to users as an address search),
> across multiple layers. For example, after an address is entered, the
> system first searches an in-house dataset, if there are no matches it
> searches a county parcel dataset, and if both fail, it tries to map the
> address via a geocoding API.
>
> The issue I'm running into is that each of the layers stores addresses a
> little differently. The in-house set tends to be sloppy about
> punctuation and things like directions ('N' vs 'North', 'St' vs 'ST.' vs
> 'Street', etc). The county is more standardized but breaks everything
> up into street prefix, name, type, suffix, etc. (Minnesota Met Council,
> for those of you familiar with it). In addition, users tend not to enter
> addresses the same way twice, and to leave out things like the street
> type and direction.
>
> I'm wondering if there's a way to relax the query matches so that
> something like "100 James" will return a match from a DBF containing
> "100 South James Ave", or a set of columns like "100" "S" "James" "Ave".
> Something along the lines of The Geocoding API is flexible in this way,
> so one solution I considered is to use it as an address parser and then
> use the returned X,Y data for an itemquery on each of the layers. The
> problems with that are slower performance and possible API
> unavailability.
>
> Currently I'm using Mapserver in CGI mode with some Javascript for
> frontend logic and custom tools. I developed the application this way
> for various reasons, but am considering moving to PHP Mapscript for
> better performance. If something like this is possible with the CGI
> approach I'd love to hear about it, but I'd also be interested in
> mapscript ideas or examples.
>
> Thanks!
>
> -Gabe
>
> Gabe Emerson
> Research Department
> Welsh Companies
> 4350 Baker Road, Suite 400
> Minnetonka, MN 55343-8695
> 952-897-7700, ext. 1306
> gemerson at welshco.com
>
> _______________________________________________
> mapserver-users mailing list
> mapserver-users at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/mapserver-users
More information about the MapServer-users
mailing list