[postgis-users] TIGER geocoder with Census 2009 shapefiles
Mark Vantzelfde
netmasters.ma at gmail.com
Tue Mar 2 10:10:21 PST 2010
Can anyone comment on the accuracy of the Tiger geocoder vs MapMarker?
Thanks
Mark
On Tue, Mar 2, 2010 at 11:40 AM, Stephen Woodbridge <woodbri at swoodbridge.com
> wrote:
> Hi Kevin,
>
> I have worked with the Tiger data for about 10 years now. The recent
> improvements in tiger are really great to see, but not without their own set
> of issues. Tiger has a lot of known limitations based on the rules, regs and
> requirements of the US Census. The recent work has georectified the street
> data and added lots of new streets based on digitizing high-res satellite
> imagery. but that does not let you read the street names so they are added
> after the fact. There are a lot of street segments that do not have names.
> We can only hope that these will be added over time. Because of
> non-disclosure, address ranges can be weird also. Many small streets have
> address ranges 1-100 encoded on them, in spite of the fact that the real
> address ranges only run from 1-20. This has the effect of skewing all the
> locations to the front end of the street.
>
> Because language is ambiguous and typos and sounds-like errors, fuzzy
> searching is employed. Most geocoders do some form of fuzzy searching so you
> often run into the Main St vs Main Ln issue or you find W Main St when you
> are search for E Main St.
>
> When a geocoder says "Found it!", you need to be prepared to say Found
> What? or be tolerant to mis-geocodes. I like geocoders the score the results
> and return them in ranked order.
>
> In general a geocoder can never be better than its data and can in fact be
> much worse than its data. Fuzzy searching lets you find possible candidates
> in the data that might not have been encoded correctly in either the input
> address or the data address, but with the uncertainty that this is the
> actual location wanted or not.
>
> You might also want to look at PAGC Geocoder. It is written in C and uses
> some statistical matching techniques which are very good, There are some
> change in one of the branches that let you load all the Tiger data for the
> US.
>
> http://www.pagcgeo.org/
>
>
> -Steve
>
>
> Kevin Galligan wrote:
>
>> I actually bought an early access copy of the book. I work in linux and
>> have been playing around with different geocoders and the tiger files. Most
>> recently with a ruby geocoder, for no other reason than I'm trying to find
>> one that is fairly complete and functional.
>>
>> Any idea how "production quality" this particular one is? If its fairly
>> high, I'll probably put some time in to get it working on linux. I have the
>> full 2009 tiger dataset on an EC2 block drive, waiting to import into a
>> different database.
>>
>> Right now I'm using zip+4 data to get a rough geocode, which is good
>> enough for what we're doing, but it only gets 92% of our non-PO Box data.
>> From my experience with the tiger data, it only adds a couple percent at
>> most above that, but the geocoders I've used have been pretty hacky, so its
>> possible that was the issue. Also, some of them seem to not be concerned
>> with stuff like matching "Main St" when you're looking for "Main Ln", which
>> is pretty terrible.
>>
>> On the plus side, if there is major work going on with this geocoder (or
>> any tiger geocoder), I have a huge national data volume that will help
>> stress test the system.
>>
>> Recently I've been toying with USC's free geocoder project. In some areas
>> it actually gets about half of the data I previously could not, which is
>> impressive.
>>
>> The really frustrating thing is, in general, the first 90% is cheap/free.
>> The next 3-4% is marginally expensive. The rest is really pricey.
>>
>> Is there any idea how complete the tiger data is, and why there is this
>> apparent lack of data in there? I find it strange. Some streets are just
>> missing. Stuff like that.
>>
>> Rambling. Anyway, will take a look later. Thoughts on the quality of the
>> geocoder appreciated.
>>
>> -Kevin
>>
>> On Fri, Feb 26, 2010 at 11:52 PM, Paragon Corporation <lr at pcorp.us<mailto:
>> lr at pcorp.us>> wrote:
>>
>> David,
>>
>> As a matter of fact we've been working on that for chapter 10 of our
>> upcoming book and think we have it all working. As a part of the
>> example
>> generation process for our chapter 10, we had to come up with a way
>> to load
>> the tables that works on both windows and Linux. Unfortunately we
>> haven't
>> had a chance to test the Linux loading approach, but is pretty much a
>> parallel of the windows approach.
>>
>> To do so we started out with Steve's code, added some additional
>> skeleton
>> tables and a database function that generates a command line script
>> for the
>> respective OS. Hopefully it all makes sense from the readme file we
>> have
>> packaged.
>>
>> We also changed one of the functions because there was an error in
>> it and
>> revised slightly to work with Tiger 2009 data. You can dowload our
>> slightly
>> hacked version of Steve's code from our chapter 10 page.
>>
>> Steve -- if you are listening we are hoping to remerge your version
>> with our
>> loader part and bring back into the PostGIS distribution as part of
>> PostGIS
>> 1.5.1 or 2.0 release.
>>
>> http://www.postgis.us/chapter_10
>>
>>
>> Leo and Regina
>> http://www.postgis.us/
>>
>>
>> -----Original Message-----
>> From: postgis-users-bounces at postgis.refractions.net
>> <mailto:postgis-users-bounces at postgis.refractions.net>
>> [mailto:postgis-users-bounces at postgis.refractions.net
>> <mailto:postgis-users-bounces at postgis.refractions.net>] On Behalf Of
>> Dave
>> Fuhry
>> Sent: Friday, February 26, 2010 3:04 PM
>> To: PostGIS Users Discussion
>> Subject: [postgis-users] TIGER geocoder with Census 2009 shapefiles
>>
>> I'm trying to set up the TIGER geocoder from
>> http://www.snowman.net/git/tiger_geocoder/ which is new and aims to
>> work
>> with the new TIGER shapefiles. I'm trying with the 2009 shapefiles
>> from
>> www2.census.gov/geo/tiger/TIGER2009/
>> <http://www2.census.gov/geo/tiger/TIGER2009/>.
>>
>>
>> I'm not sure how to create the roads_local table (derived closely from
>> completechain in the old version). A join between edges and addr?
>>
>> Wondering if anyone can offer any direction. A relevant ticket is
>> http://trac.osgeo.org/postgis/ticket/135. The out-of-date file
>> which used
>> to create the roads_local table is tables/roads_local.sql, in the above
>> repository.
>>
>> -Dave
>>
>> Table "tiger.edges"
>> Column | Type | Modifiers
>>
>> ------------+------------------------+----------------------------------
>> ------------+------------------------+--------------------------
>> gid | integer | not null default
>> nextval('public.edges_gid_seq'::regclass)
>> statefp | character varying(2) |
>> countyfp | character varying(3) |
>> tlid | bigint |
>> tfidl | bigint |
>> tfidr | bigint |
>> mtfcc | character varying(5) |
>> fullname | character varying(100) |
>> smid | character varying(22) |
>> lfromadd | character varying(12) |
>> ltoadd | character varying(12) |
>> rfromadd | character varying(12) |
>> rtoadd | character varying(12) |
>> zipl | character varying(5) |
>> zipr | character varying(5) |
>> featcat | character varying(1) |
>> hydroflg | character varying(1) |
>> railflg | character varying(1) |
>> roadflg | character varying(1) |
>> olfflg | character varying(1) |
>> passflg | character varying(1) |
>> divroad | character varying(1) |
>> exttyp | character varying(1) |
>> ttyp | character varying(1) |
>> deckedroad | character varying(1) |
>> artpath | character varying(1) |
>> persist | character varying(1) |
>> gcseflg | character varying(1) |
>> offsetl | character varying(1) |
>> offsetr | character varying(1) |
>> tnidf | bigint |
>> tnidt | bigint |
>> the_geom | public.geometry |
>>
>>
>> Table "tiger.addr"
>> Column | Type | Modifiers
>>
>> -----------+-----------------------+------------------------------------
>> -----------+-----------------------+-----------------------
>> gid | integer | not null default
>> nextval('public.addr_gid_seq'::regclass)
>> tlid | bigint |
>> fromhn | character varying(12) |
>> tohn | character varying(12) |
>> side | character varying(1) |
>> zip | character varying(5) |
>> plus4 | character varying(4) |
>> fromtyp | character varying(1) |
>> totyp | character varying(1) |
>> fromarmid | integer |
>> toarmid | integer |
>> arid | character varying(22) |
>> mtfcc | character varying(5) |
>> statefp | character varying(2) | not null
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> <mailto:postgis-users at postgis.refractions.net>
>>
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> <mailto:postgis-users at postgis.refractions.net>
>>
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>>
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
--
Mark Vantzelfde
NetMasters, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20100302/ec8d87f5/attachment.html>
More information about the postgis-users
mailing list