[postgis-users] TIP: Loading TIGER data with Primary Keys

Steve Walker walker at mfgis.com
Tue Dec 20 19:33:45 PST 2011


Hi,

I want to offer the following tip on how to properly build and load a
nation-wide (or subset thereof) US Census Bureau TIGER database into
PostGIS using ogr2ogr, in such a manner as to maintain the primary key
integrity inherent in the native TIGER database.

For the purpose of this example, I'll identify two adjacent counties
within the same state:  CountyA and CountyB, and we will work only with
the edges theme.  

We can imagine that a road named "County Line Rd" forms part of the
boundary between these two counties.   

In the TIGER database, the road will represented by one or more EDGEs.
We'll simplify this to assume that there is exactly one edge feature
representing "County Line Rd"  and that the edge feature has a unique
Tiger Line Identifier (TLID) of 12345687890  which is unique throughout
the entire TIGER database, nationwide.   (We'll also simplify this to
ignore N-to-N relations between edges, roads, and names etc.  They're
important, but irrelevant to this discussion.)

So far so good.   Unfortunately however, the countyA_edges.shp and
countyB_edges.shp each maintain a representation of this feature.  TLID
1234567890, although a unique primary key, is duplicated between two
shapefiles.  

Normally, ogr2ogr would neither notice nor would it care.  This is
because ogr2ogr will by default add and populate an arbitrary primary
key attribute 'ogc_fid' when loading data.  

Thus, the plain vanilla ogr2ogr load of TIGER will not fail due to any
duplicate primary key violations, but at the expense of losing the
normalized integrity of the database.  We will end up with County Line
Rd and TLID 1234567890 being duplicated in the database, and we won't be
able to leverage the native TLID primary key in all our subsequent
operations.

Here is how to fix that:


So the first thing we would reasonably do would be something like this:

ALTER TABLE edges DROP COLUMN ogc_fid;    

This would dump the arbitrary primary key set up by ogr2ogr

And then we would attempt:

ALTER TABLE edges ADD PRIMARY KEY (tlid);

Which would fail once it hit the second record with a TLID of 1234567890
(ie because it duplicates the primary key constraint.)

So, this cannot be easily done after the fact:  IE when we have a whole
state or the whole nation already loaded into the database.   
-------


OK, so here is the TIP   (pseudo-code):

Starting with an totally empty database:


shell:
#ogr2ogr countyA_edges.shp EDGES    #### load the first shapefile.    

psql:
psql: ALTER TABLE edges DROP column ogc_fid     ### drop the arbitrary
primary key

psql:  ALTER TABLE edges ADD PRIMARY KEY (tlid);   ### add the real
primary key.

(back to the shell:)

#ogr2ogr -skipfailures -gt 1 countyB_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyC_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyD_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyE_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyF_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyG_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyH_edges.shp EDGES
#ogr2ogr -skipfailures -gt 1 countyI_edges.shp EDGES
#etc..... through all 3000+ counties

The two flags 'skipfailures' and 'gt 1' make this possible.

-gt 1 isolates each and every feature in the shapefile into its own
transition block, (BEGIN  ... INSERT .... COMMIT|ROLLBACK)  

thus, the second attempt to INSERT tlid 1234567890 will fail with a
primary key constraint violation.    But when it fails, it will only
ROLLBACK this single individual INSERT.  the previous record and the
following record won't be affected (without gt -1 it would ROLLBACK
everything)

-skipfailures flag simply allows the bulk load procedure to continue
when the individual INSERTS fail.


====

That's really what it takes to build primary key integrity into your
TIGER database from the outset.

-S







-- 
Steve Walker
Middle Fork Geographic Information Services
(360)671-2505




More information about the postgis-users mailing list