[GRASS-user] Help: Completely confused about multi-layered vectors trying to import TIGER/Line files

Fri Feb 29 10:45:42 EST 2008

On Fri, Feb 29, 2008 at 09:37:36AM -0500, we recorded a bogon-computron collision of the <dkindem at chartermi.net> flavor, containing:
> If it's helpful, you can also extract attributes to individual .shp files 
> using ogr2ogr -where.  For example
> 
> ogr2ogr -where 'cfcc = B11' output.shp input_file.rt1 CompleteChain
> 
> (B11 is the TigerLine attribute code for main line railroads)
> 
> The individual layers can then be brought into GRASS.
> 
> There is a good explanation of this in the book Mapping Hacks (it's Hack 
> #68).

This is fine for linear features, which are all stored in CompleteChain with
their attributes attached.

The problem is for area features, for which NO attributes like CFCC are
available until the AreaLandmarks and Polygon tables are merged with the
PIP records. That's what I'm on about.  You can't do

  ogr2ogr -where "CFCC=D83" output.shp input_file.rt1 Landmarks

and get polygons of the national parks --- you would at best get a table
of attributes with no geometry.  

It seems that the only thing to do is a table join, which requires the
use of something other than the DBF driver, as Michael points out (I have
been using sqlite for that reason).  Perhaps doing a table join of 
AreaLandmarks to PIP using the POLYID key, then selecting with v.extract only 
those features that have non-null values in AreaLandmarks attributes will get 
me what I want.  I will try that next.

> On Feb 28, 2008, at 10:57 AM, Dylan Beaudette wrote:
> 
>> On Thu, Feb 28, 2008 at 7:39 AM, Tom Russo <russo at bogodyn.org> wrote:
>>> I have been trying to wrap my brain around "multi-layered" GRASS vectors 
>>> and
>>>  have only succeeded in wrapping my brain into knots.  Perhaps someone 
>>> here with
>>>  a solid understanding of this stuff can help me.
>>> 
>>>  I'm trying to figure out how to import TIGER/Line data and actually get 
>>> the
>>>  attributes of areas pulled in.  This is trouble.
>>> 
>>>  The v.in.ogr documentation has an example of how to do it:
>>> 
>>> 
>>>  v.in.ogr  dsn=~/TIGER/BC_TGR  layer=CompleteChain,PIP output=t35001_all 
>>> \
>>>                      type=boundary,centroid snap=-1
>>> 
>>>  which does indeed import the CompleteChain layer and PIP (Polygon 
>>> Internal
>>>  Point) layers --- it puts the boundaries in layer 1 and the centroids in
>>>  layer 2, and if I do a
>>>   d.vect t35001_all layer=2
>>>  I can see the areas just fine.
>>> 
>>>  Thing is, TIGER data has almost no usable attributes in the PIP layer 
>>> itself
>>>  --- this is nothing more than a centroid with an attribute "POLYID", 
>>> telling
>>>  nothing about the area.  To get that information, one needs additional 
>>> tables
>>>  linked to the POLYID field.  The TIGER data tries to be a normal form 
>>> database,
>>>  so there are many tables with no geometry that relate back to linking 
>>> fields
>>>  in the attribute tables that are attached to geometry.
>>> 
>>>  There's another table in the TIGER data called "Polygon" that has more
>>>  attributes related to the census, such as congressional district and 
>>> such.
>>>  Actual names  and types of areas (parks, military installations, etc) 
>>> are
>>>  linked through two tables, AreaLandmarks and Landmarks --- the first 
>>> links
>>>  POLYID to a LAND attribute (an integer) and the second links the LAND
>>>  attribute to an actual name --- there is a many-to-one relationship 
>>> between
>>>  AreaLandmarks and Landmarks.  The latter table actually contains the 
>>> feature
>>>  class code that tells you what type of landmark the polygon represents.  
>>> Not
>>>  every polygon has a matching record in AreaLandmarks, only those that 
>>> actually
>>>  represent landmarks.  The other polygons are just administrative regions 
>>> or
>>>  other artificial areas.  On top of that, there appear to be some records 
>>> of
>>>  Landmarks that do NOT have any connection to records in AreaLandmarks -- 
>>> these
>>>  are point landmarks, and so the Landmarks layer of the TIGER data also 
>>> has
>>>  point geometry.  Where there is a relationship between Landmarks and
>>>  AreaLandmarks I suppose the Landmark point would be usable as a label 
>>> point.
>>> 
>>>  Attaching the "Polygon" records to the centroids seems an easy problem, 
>>> as
>>>  every centroid has a Polygon record with attributes, and all that's 
>>> needed
>>>  is a table join on the POLYID field between the table already attached 
>>> to the
>>>  centroids and the Polygon table imported with db.in.ogr.  I can do that.
>>> 
>>>  The AreaLandmarks are another matter --- only a small fraction of the
>>>  polygons have associated AreaLandmarks.  For example, in the TIGER/Line 
>>> data
>>>  for Bernalillo County, New Mexico, there are 18597 PIP and Polygon 
>>> records,
>>>  but only 1292 AreaLandmarks records.  This makes it seem to me that
>>>  AreaLandmarks would be a prime candidate for a third "layer" in the
>>>  "t35001_all" vector.  My problem is selecting the geometric objects to 
>>> tie to
>>>  the records of the AreaLandmarks table.
>>> 
>>>  I have tried a number of naive things that didn't work.  First, I tried 
>>> to
>>>  add a "cat" field to the AreaLandmarks table, then use an SQL UPDATE 
>>> query
>>>  to copy the cat column of the table associated with the PIP records 
>>> (which
>>>  gets called "t35001_all_2" by the v.in.ogr import) that have POLYIDs 
>>> that
>>>  match the ones in the AreaLandmarks:
>>> 
>>>    echo "UPDATE t35001_AreaLandmarks SET cat=(SELECT cat FROM 
>>> t35001_all_2 WHERE t35001_all_2.POLYID=t35001_AreaLandmarks.POLYID)" | 
>>> db.execute
>>> 
>>>  and then adding AreaLandmarks as a third database connection.  That SQL
>>>  update worked fine in the sense that the AreaLandmarks table got the 
>>> right
>>>  cat values out of the layer 2 table, but the approach didn't work, 
>>> because
>>>  as far as I understand it the categories of objects in layer 2 haven't
>>>  actually been assigned to geometric objects for layer 3 yet, so linking 
>>> this
>>>  way is meaningless at this step.  Attaching the table gives me some 
>>> warnings,
>>>  and apparently attaches no database records to any geometry at all.
>>> 
>>>  So then I tried creating a new reclassified vector with
>>> 
>>>   v.reclass in=t35001_all out=foo col=POLYID layer=2 type=centroid
>>> 
>>>  setting the categories of layer 2 in the new vector to be equal to the 
>>> POLYID
>>>  of layer 2 in the old vector without copying the table, then doing:
>>> 
>>>   echo  "UPDATE t35001_AreaLandmarks SET cat=POLYID" | db.execute
>>> 
>>>  and attaching that table to layer 2 of the reclassified vector.  This 
>>> sorta
>>>  kinda worked, in that the AreaLandmarks records were attached to 
>>> features
>>>  properly, but left a very large fraction of the features without 
>>> database
>>>  connections.  So I could display vector "foo" and use d.what.vect to 
>>> query
>>>  my new attributes, but there were 17305 polygons out of 18597 that I 
>>> could
>>>  click and get a warning of a missing database entry --- I would rather 
>>> have
>>>  only the polygons that have an AreaLandmarks record show up at all when
>>>  displaying this layer.
>>> 
>> 
>> Hi Tom,
>> 
>> TIGER data (in its current format) is a beast to work with-- in any
>> GIS. I have found that the simplest way to get "normal" looking data
>> out of the TIGER files was with the tutorial on processing these data
>> using PostGIS [1]. There are excellent instructions on the GeoServer
>> page [1] that make use of basic PostGIS functionality, along with some
>> self-contained java utilities. Following those instructions I was able
>> to get *most* of the point, line, and polygon data (with meaningful
>> attributes) extracted to simple tables in PostGIS. It would be a
>> simple matter to export to GRASS from there.
>> 
>> I think that you can get 2000 TIGER data, by county, from our favorite
>> 'leading GIS vendor' for free [2]. That format should be simpler to
>> use.
>> 
>> There is also tiger2shp [3] which is now distributed for free (Windows).
>> 
>> Finally, it looks like the US Census will be moving away from the
>> current data format, and will be henceforth distributing their data as
>> shapefiles [4]. Hurray!
>> 
>> 1. http://geoserver.org/display/GEOSDOC/Loading+TIGER+data
>> 2. http://www.esri.com/data/download/census2000_tigerline/index.html
>> 3. http://tnatlas.geog.utk.edu/downloadfree.htm
>> 4. http://www.census.gov/geo/www/tiger/future/future_tl.html
>> 
>> 
>>>  Since I'm rambling now, I'll just cut to the chase and ask my question:
>>> 
>>>   Given a GRASS vector with two attached database tables, one of which 
>>> (layer
>>>   2, attaching attributes to centrois) has a key field POLYID, how can I 
>>> create
>>>   a new layer using a third, much smaller table, that also has a POLYID 
>>> field,
>>>   such that the new layer only contains that subset of centroids where 
>>> the
>>>   third layer table's POLYID matches layer 2's POLYID for that centroid?
>>> 
>>>  At some later point I'll worry about the Landmarks point layer and its
>>>  associated attributes, after I've figured out what the story is with
>>>  connecting the AreaLandmarks to the subset of centroids.
>>> 
>>>  I am convinced that if I truly understood how layers work in GRASS then 
>>> my
>>>  question's answer would be self-evident, but I also know that "if A then 
>>> B"
>>>  is a true statement when both A and B are false.  I have been completely
>>>  unable to piece together a thorough (or even marginal) understanding 
>>> from man
>>>  pages and even the third edition "Open Source GIS: A GRASS GIS Approach"
>>>  book.   Any help anyone can give me would be greatly appreciated.
>> 
>> I cannot offer any insight into the GRASS vector/layers system right
>> now, but there are several good threads from a couple of months ago on
>> this very topic (vector/layers/confusion).
>> 
>> Good luck,
>> 
>> Dylan
>> 
>> 
>>>  --
>>>  Tom Russo    KM5VY   SAR502   DM64ux          
>>> http://www.swcp.com/~russo/
>>>  Tijeras, NM  QRPL#1592 K2#398  SOC#236 AHTB#1 
>>> http://kevan.org/brain.cgi?DDTNM
>>>  "And, isn't sanity really just a one-trick pony anyway? I mean all you 
>>> get is
>>>   one trick, rational thinking, but when you're good and crazy, oooh, 
>>> oooh,
>>>   oooh, the sky is the limit!"  --- The Tick
>>>  _______________________________________________
>>>  grass-user mailing list
>>>  grass-user at lists.osgeo.org
>>>  http://lists.osgeo.org/mailman/listinfo/grass-user
>>> 
>> _______________________________________________
>> grass-user mailing list
>> grass-user at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/grass-user
> 

-- 
Tom Russo    KM5VY   SAR502   DM64ux          http://www.swcp.com/~russo/
Tijeras, NM  QRPL#1592 K2#398  SOC#236 AHTB#1 http://kevan.org/brain.cgi?DDTNM
"And, isn't sanity really just a one-trick pony anyway? I mean all you get is
 one trick, rational thinking, but when you're good and crazy, oooh, oooh,
 oooh, the sky is the limit!"  --- The Tick