[postgis-users] TIGER Geocoder and TIGER'sown Primary Keys and the use of tableA.statefp =tableB.statefp: I think its' a kludge

Paragon Corporation lr at pcorp.us
Wed Dec 21 12:08:50 PST 2011


Steve,
> 
> I don't see where the topology issue is relevant here.  
> Everything should be able to be done within SQL without 
> reliance upon topology, when TIGER's primary keys are used.
> 

The only reason on my radar for wanting a normalized tiger dataset is for
breaking
edges at the local government level and to keep track of what edges were
edited and so forth as well as fitting my existing geometries within the
topology of TIGER.

Your vision of normalization doesn't satisfy that need because a geometry is
inherently a denormalized
vision of a normalized topology.

> Perhaps the geocoder as is has gone too far down the path 
> you've described to be easily adapted?
> 
What benefit are you striving to get out of this aside? It would help if I
understood that more clearly.

I can forsee it would be faster if TLID and tfid were made primary keys, but
I dismissed it for the time being since I didn't have time to benchmark the
difference and also write routines to fix data of people who have already
loaded their data.

As far as making tlid a primary key.  On further inspection of our loader, I
realized we load into staging tables
anyway so that we could rename all the fields that TIGER found the need to
tack years on. So as part of that routine we drop the generated gid anyway,
rename some others, add others, and drop others.

So that probably wouldn't take too much effort.  The easiest would be to use
an EXCEPT clause since each county is loaded into staging and then added to
the core state table.  It would probably be slower than an append skip, but
append skip is not an option since the data is already in the database
by the time we get to that point.

It won't change the queries though because we would still 
need the state joins to take advantage of constraint exclusion unless you
have other thoughts on that.  So that alone violates perfect normalization
at the TLID  level.


Again keep in mind our priorities are:
1) Build a geocoder that is easy for everyone to load and use on any
platform PostGIS runs on
2) User Maintainability - that means being able to drop state tables etc and
load newer ones piecemeal or migrate a set of states to another database
without having to change the underlying code.

3) Speed
4) Developer Maintainability -- I put this as last because it's too open to
debate how you should structure your code so easy to some is not necessarily
easy to others without knowing the developers invovled.

For example I would never use a NATURAL JOIN because its too prone to all
your code breaking if you happen to have two columns with same name but
different meeting, but that's just me.

Thanks,
Regina
http://www.postgis.us






More information about the postgis-users mailing list