[postgis-devel] Typo 1 Re: PostGIS 2.0.0SVN: TIGER Geocoder and TIGER's own Primary Keys and the use of tableA.statefp = tableB.statefp: I think its' a kludge
Steve Walker
walker at mfgis.com
Tue Dec 20 18:00:34 PST 2011
Sorry for the typo
Where I wrote
(addrfn.arid = addrfn.arid)
I should have written
(addr.arid = addrfn.arid)
(ie to link the ADDR and ADDRFN tables via the ARID key)
This does not change the fundamentals of the hypothesis in any way.
On Tue, 2011-12-20 at 17:31 -0800, Steve Walker wrote:
> (Disclaimer in advance: I could be totally wrong)
>
> I have just started to work with the TIGER Geocoder against a
> previously pre-populated complete TIGER database.
>
> [Aside: Recommend de-coupling code that builds the geocoder from code
> that imports the database]
>
>
> It seems to me that the geocoder ignores TIGER's own native primary
> keys and relies on some unnecessary workarounds (ie populating tables
> unnecessarily with the STATEFP attribute).
>
> I have started digging into for example the geocode.sql script and so
> can give one specific example. But let me back up.
>
> I believe the problem stems from the very beginning - during the
> creation and import of the TIGER data itself. Most TIGER themes do
> have native primary keys (e.g. the EDGES theme has the primary key
> TLID, the ADDR theme has the pkey 'ARID' and links to the FEATNAMES
> theme via ADDRFN arid-->linearid). However, these will be duplicated
> at the shared boundaries of counties and states, and so the process of
> importing the entire database needs to take this into account. I
> achieved this by making sure the following flags were included in the
> ogr2ogr command which loads the TIGER data:
>
> -skipfailures -gt 1
>
> The -gt 1 (Group features per Transaction = 1 ) flag will attempt to
> load and then verify every single record individually. The
> -skipfailures flag will keep the bulk load from crashing when a
> specific record fails. So here, what we're doing is rejecting
> one-by-one the features that would violate the primary key constraint.
> (Fake example: "TLID: 12345567890; FULLNAME: County Line RD" would
> be a line shared by two county shapefiles. The second attempt to
> insert this into the database would fail gracefully, since the TLID
> pkey already exists)
>
> The end result is a nation-wide TIGER database with all native primary
> keys.
>
> I believe that the TIGER geocoder suffers from the lack of this use of
> native primary key integrity.
>
> Following from that belief, I have a hunch that the geocoder uses a
> number of unneccesary workarounds.
>
> Namely, I see the geocoder artificially and unnecessarily populating
> state FIPS (statefp) data into tables (e.g. FEATNAMES, ZCTA5, etc)
> when the database normal form does not require it.
>
> Here is the promised example, from the file 'geocode_address.sql'
>
> Current SVN code, requiring the tableA.statefp = tableB.statefp kludge
> (around line 107)
>
> FROM featnames AS f INNER JOIN addr As ad ON (f.tlid = ad.tlid) WHERE
> $10 = f.statefp AND $10 = ad.statefp ### neither Featnames nor
> Addr have STATEFP as a native attribute
>
> I think the following does the same thing using the native TIGER data
> and without requiring the kludge of adding and populating the statefp
> attribute into the featnames table:
>
> FROM featnames NATURAL JOIN addrfn INNER JOIN
> addr(addrfn.arid=addrfn.arid) WHERE featnames.tlid = addr.tlid ' ###
> requires only the columns native to the TIGER tables
>
> ......
>
> And so that is where I am right now. Looking through the various SQL
> statements I see this WHERE tableA.statefp = tableB.statefp clause
> peppered throughout the code.
>
>
> I do not believe there is any inherent reason within the TIGER
> database to require these WHERE clauses, but the complexity of the SQL
> code is such that I don't yet understand it fully enough to make all
> (more?) of the required changes.
>
>
> Discussion?
>
>
>
>
>
>
>
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
--
Steve Walker
Middle Fork Geographic Information Services
(360)671-2505
More information about the postgis-devel
mailing list