[Geodata] [Tiger] A few interesting observations on the Tiger2007fedata

Stephen Woodbridge woodbri at swoodbridge.com
Wed Jul 2 15:54:21 EDT 2008


John P. Linderman wrote:
> Stephen Woodbridge <woodbri at swoodbridge.com> said (in part):
>> It would be great if someone from the US Census would monitor
>> this list.  I'll have to see if I can find anyone that might be
>> interested. It would also be neat to setup some kind a database like:
>>
>> user|date|tlid|ss|ccc|file|action|fieldname|oldval|newval
>>
>> This would allow us to create a database of corrections, errors, etc 
>> that could be automatically applied to the data when processing it and 
>> could be given to the Census if they are interested?
>>
>> Any thoughts on this, on setting something like this up? Maybe it is not 
>> worth the effort.
>>
>> -Steve
> 
> I think that's a fabulous idea.  It sounds, from mail I got back
> from TigerLine, that they don't expect to do a real cleanup
> until after the 2010 census...
> 
>> John,
>>
>> The Census Bureau geography staff ran an address edit program to fix
>> address range inconsistencies,  like the one that you mention below,
>> however some of the address ranges could have been missed.  It is also
>> possible that the 'mixed order range' is due to the introduction of new
>> address ranges to the database.  Depending on the source file that was used
>> to update the database, the data could have been entered in according to
>> the source file or it could have been entered in reverse order.  In either
>> case, no more address range edits will be run on the data until sometime
>> after the 2010 Census therefore these inconsistencies will continue to
>> appear in the data.
> 
> So we either live with dirty data for two more years, correct our own
> copies, or make the corrections irresistible to the TigerLine people.
> Which indicates to me that we invite them to make suggestions about the
> format they might find most useful.  I'll copy the generic tiger email
> contact, and recommend they might want to elect someone to sign up at
> 
> http://lists.osgeo.org/mailman/listinfo/geodata
> 
> if they are interested in what we are trying to do (it's low volume).
> But let's not cc them, unsolicited, on too much stuff, lest we wear
> out our welcome.
> 
> As for your specific suggestion,
> 
>> user|date|tlid|ss|ccc|file|action|fieldname|oldval|newval
> 
> since not every file has tlid as a key, I think we might want
> something closer to
> 
> ss|ccc|file|key|fieldname|date|action|oldval|newval|user|comments
> 
> If sorted, this brings all the action for a given file, record,
> and field together, which would be pretty handy for seeing if
> a proposed change is already present.  (I tend to think in terms
> of sorted flat files, since they are highly portable, but this
> doesn't exclude a database version with indexes on all the
> important fields.)  The comments (which might be broken down
> into further fields, if they are structured), would make the
> changes easier to understand, like
> 
> reduced address number 277 to 27 because range 29-99 appears in tlid 60569908
> 
> And, given that we probably haven't thought of all the fields,
> and that comments might get pretty long, we might want key/value
> pairs on multiple lines, separated by empty lines, like
> 
> ss	34
> ccc	039
> file	edges
> ...
> #	adjacent edge 60569908 begins at address 29
> #	adjacent edge 60569922 ends at address 21
> 
> In any event, the ability to share corrections in a structured
> way seems most worthwhile.
> 
> PS: I'll comment on other stuff from Stephen in separate mail,
> since this seems like an important thread of its own. -- jpl

I think it would be generally worth while to maintain a "collection of 
corrections and comments" about various Census objects. I think the 
issue with doing this will be to setup the infrastructure. 
Simplistically, we need a data store and an API that allows registered 
users to update it. We could make a daily dump available for download by 
anyone wanting it.

Why a registered users? I think all data/transactions need to be 
auditable and assigning a userid/password that is needed to edit the 
data helps provide this and it cuts down on spam bots.

Its a great idea, but like all ideas someone needs to do some work, and 
I am way over booked at the moment to take on building this. So:

1) anyone interested in building this tool
2) can we get the geodata team to help out here, would Census be 
interested in build and hosting something like this.
3) where should we host it if we get that far
4) licensing, agreed to during registration, probably needs to be public 
domain or something so there are no issues passing the data back to 
Census if they are interested.

Thoughts,
   -Stephen Woodbridge


More information about the Geodata mailing list