[postgis-devel] Typo 1 Re: PostGIS 2.0.0SVN: TIGER Geocoder and TIGER's own Primary Keys and the use of tableA.statefp = tableB.statefp: I think its' a kludge

Steve Walker walker at mfgis.com
Tue Dec 20 18:00:34 PST 2011


Sorry for the typo

Where I wrote 

(addrfn.arid = addrfn.arid)

I should have written

(addr.arid = addrfn.arid)     
(ie to link the ADDR and ADDRFN tables via the ARID key)


This does not change the fundamentals of the hypothesis in any way.


On Tue, 2011-12-20 at 17:31 -0800, Steve Walker wrote:
> (Disclaimer in advance:  I could be totally wrong)
> 
> I have just started to work with the TIGER Geocoder against a
> previously pre-populated complete TIGER database.  
> 
> [Aside:  Recommend de-coupling code that builds the geocoder from code
> that imports the database]
> 
> 
> It seems to me that the geocoder ignores TIGER's own native primary
> keys and relies on some unnecessary workarounds (ie populating tables
> unnecessarily with the STATEFP attribute).
> 
> I have started digging into for example the geocode.sql script and so
> can give one specific example.   But let me back up.
> 
> I believe the problem stems from the very beginning - during the
> creation and import of the TIGER data itself.   Most TIGER themes do
> have native primary keys (e.g. the EDGES theme has the primary key
> TLID, the ADDR theme has the pkey 'ARID' and links to the FEATNAMES
> theme via ADDRFN arid-->linearid).  However, these will be duplicated
> at the shared boundaries of counties and states, and so the process of
> importing the entire database needs to take this into account.  I
> achieved this by making sure the following flags were included in the
> ogr2ogr command which loads the TIGER data:
> 
> -skipfailures -gt 1
> 
> The -gt 1 (Group features per Transaction = 1 ) flag will attempt to
> load and then verify every single record individually.  The
> -skipfailures flag will keep the bulk load from crashing when a
> specific record fails.  So here, what we're doing is rejecting
> one-by-one the features that would violate the primary key constraint.
> (Fake example: "TLID:   12345567890; FULLNAME:  County Line RD" would
> be a line shared by two county shapefiles.  The second attempt to
> insert this into the database would fail gracefully, since the TLID
> pkey already exists)
> 
> The end result is a nation-wide TIGER database with all native primary
> keys.
> 
> I believe that the TIGER geocoder suffers from the lack of this use of
> native primary key integrity.
> 
> Following from that belief, I have a hunch that the geocoder uses a
> number of unneccesary workarounds.
> 
> Namely, I see the geocoder artificially and unnecessarily populating
> state FIPS (statefp) data into tables (e.g. FEATNAMES, ZCTA5, etc)
> when the database normal form does not require it.
> 
> Here is the promised example, from the file 'geocode_address.sql'
> 
> Current SVN code, requiring the tableA.statefp = tableB.statefp kludge
> (around line 107)
> 
> FROM featnames AS f INNER JOIN addr As ad ON (f.tlid = ad.tlid) WHERE
> $10 = f.statefp AND $10 = ad.statefp      ### neither Featnames nor
> Addr have  STATEFP as a native attribute 
> 
> I think the following does the same thing using the native TIGER data
> and without requiring the kludge of adding and populating the statefp
> attribute into the featnames table:
> 
> FROM featnames NATURAL JOIN addrfn INNER JOIN
> addr(addrfn.arid=addrfn.arid) WHERE featnames.tlid = addr.tlid '   ###
> requires only the columns native to the TIGER tables
> 
> ......
> 
> And so that is where I am right now.  Looking through the various SQL
> statements I see this WHERE tableA.statefp = tableB.statefp clause
> peppered throughout the code.
> 
> 
> I do not believe there is any inherent reason within the TIGER
> database to require these WHERE clauses, but the complexity of the SQL
> code is such that I don't yet understand it fully enough to make all
> (more?) of the required changes.
> 
> 
> Discussion?
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
> 
-- 
Steve Walker
Middle Fork Geographic Information Services
(360)671-2505




More information about the postgis-devel mailing list