[postgis-users] FW: [postgis-devel] PostGIS 2.0.0SVN: TIGER Geocoder and TIGER'sown Primary Keys and the use of tableA.statefp =tableB.statefp: I think its' a kludge

Paragon Corporation lr at pcorp.us
Wed Dec 21 11:37:02 PST 2011


Just realized I sent this to the wrong list.   

-----Original Message-----
From: Paragon Corporation [mailto:lr at pcorp.us] 
Sent: Wednesday, December 21, 2011 6:27 AM
To: 'PostGIS Development Discussion'
Subject: RE: [postgis-devel] PostGIS 2.0.0SVN: TIGER Geocoder and TIGER'sown
Primary Keys and the use of tableA.statefp =tableB.statefp: I think its' a
kludge

> Yet my  biggest personal
>  challenge is reading through the sql code and the chained  clauses, 
> in
>  for example 'geocode.sql'.   A couple things that just made 
> it tough for
> me to understand - beyond my limited ability to chain together all the
> clauses - is some of the uses of aliases in the sql code.   
>  At one point
>  'f' is an alias for 'featnames' at another point 'f' is an alias for
>  'faces.'    I'm trying to re-write stuff by dropping the 
> aliases in
>  favor of explicit table.attribute syntax so I can more
> > explicitly see
>  the actual tables and attributes with which I'm working.

Steve,

I have mixed feelings about this.   I wasn't the one that wrote the bulk of
the sql.

Steve Frost and his friend (who I can't quite remember his name) did that
and I didn't have any problems reading there code.  They did a great job in
fact.

The main issue I have with using the correct name is 2

1) Practicality - you can't because some pieces do require joining by the
same table more than once so at least one is not going to have the same name
and the other is not. Sure you could alias it to be suffixed with faceA ,
faceB etc. So that's minor
2) It's a pain to type and the extended length makes it harder for me to
read as I have more lines to scan to gain clarity.

That said I'm not totally against the idea if people feel it would make it
easier for them to understand and contribute.

On the pro side, I think I did get a bit lazy and when I was folding queries
or expanding them it was faster to just move the logic up and change the
alias to where it was before.  Thus getting into your frustration with the
same alias used to mean different things.

That should probably be changed.


> I do think I have a 100% complete and properly  normalized TIGER 2010 
> database to work against if that helps.

It would be useful to see if that helps with the speed of some things.  I
think the faces it would. edges I doubt it.

As far as removing statefp -- I don't see that working without just dumping
all the data into the same tables.  Without that your vision of a perfectly
normalized TIGER is not attainable as you'd still have dupes in the state
tables.


Keep in mind we chose to break data out by state not just for speed reasons,
but for maintainability.

a) We work with a lot of of users and most of them need only one state full
of data.
Its easier to load one set of state tables from one database to another than
to try to copy the whole thing for obvious reasons.

That is actually the primary reason we broke the data up this way.

b) The second is for speed, since for at least lower systems with lower
memory the partitioned approach seemed to be better speedwise.
For a system with lots of memory may not make too much of a difference.

I can not stress enough that we care more about ease of use and speed than
how normalized our data is.  Normalization is important when you are
managing / modifying TIGER data, but not when you are using it for
geocoding.  With that said, I would suggest you look at the PostGIS Topology
model.

If ever we were to build a geocoder on a normalized system (which would have
the side benefity of being useful for local gov TIGER management), it would
be based on PostGIS Topology.

Thanks,
Regina
http://www.postgis.us  







More information about the postgis-users mailing list