[postgis-devel] PostGIS 2.0.0SVN: TIGER Geocoder and TIGER'sown Primary Keys and the use of tableA.statefp =tableB.statefp: I think its' a kludge

Wed Dec 21 03:27:09 PST 2011

> Yet my  biggest personal 
>  challenge is reading through the sql code and the chained
>  clauses, in
>  for example 'geocode.sql'.   A couple things that just made 
> it tough for
> me to understand - beyond my limited ability to chain
> together all the
> clauses - is some of the uses of aliases in the sql code.   
>  At one point
>  'f' is an alias for 'featnames' at another point 'f' is 
> an alias for
>  'faces.'    I'm trying to re-write stuff by dropping the 
> aliases in
>  favor of explicit table.attribute syntax so I can more
> > explicitly see
>  the actual tables and attributes with which I'm working.

Steve,

I have mixed feelings about this.   I wasn't the one that wrote the bulk of
the sql.

Steve Frost and his friend (who I can't quite remember his name)
did that and I didn't have any problems reading there code.  They did a
great job in fact.

The main issue I have with using the correct name is 2

1) Practicality - you can't because some pieces do require joining by the
same table more than once so at least one is not going to have the same name
and the other is not. Sure you could alias it to be suffixed with faceA ,
faceB etc. So that's minor
2) It's a pain to type and the extended length makes it harder for me to
read as I have more lines
to scan to gain clarity.

That said I'm not totally against the idea if people feel it would make it
easier for them to understand and contribute.

On the pro side, I think I did get a bit lazy and when I was folding queries
or expanding them it was faster to just
move the logic up and change the alias to where it was before.  Thus getting
into your frustration with the same alias used to mean different things.

That should probably be changed.

> I do think I have a 100% complete and properly 
>  normalized TIGER
> 2010 database to work against if that helps.

It would be useful to see if that helps with the speed of some things.  I
think the faces it would. edges I doubt it.

As far as removing statefp -- I don't see that working without just dumping
all the data into the same tables.  Without that your vision
of a perfectly normalized TIGER is not attainable as you'd still have dupes
in the state tables.

Keep in mind we chose to break data out by state not just for speed reasons,
but for maintainability.

a) We work with a lot of of users and most of them need only one state full
of data.
Its easier to load one set of state tables from one database to another than
to try to copy the whole thing for obvious reasons.

That is actually the primary reason we broke the data up this way.

b) The second is for speed, since for at least lower systems with lower
memory the partitioned approach seemed to be better speedwise.
For a system with lots of memory may not make too much of a difference.

I can not stress enough that we care more about ease of use and speed than
how normalized our data is.  Normalization is important
when you are managing / modifying TIGER data, but not when you are using it
for geocoding.  With that said, I would suggest you look at 
the PostGIS Topology model.

If ever we were to build a geocoder on a normalized system (which would have
the side benefity of being useful for local gov TIGER management),
it would be based on PostGIS Topology.

Thanks,
Regina
http://www.postgis.us