RNFdev Thinking through polygon creation
Dan Putler
putler at sauder.ubc.ca
Fri Sep 22 12:35:17 EDT 2006
Hi Dave,
> Newbie digitizer eh?... Ohhhh the fun and games we'll have (smile)...
> Concepts are pretty simple. I think we have concluded the first
> symantical discussion that digitizing always brings about...
> Asthetic VS
> Accuracy. I think we both agree accuracy is more import than a NICE
> looking vector (for the most part). There is an art to digitizing just
> as there apears to be one for geocoding.
I have to admit, the accuracy outlet is based on some quick learning
about the art of digitalizing. Specifically, making things
ascetically pleasing is time consuming (shore lines are a
particularly nasty bit of work, but lining up the polygon edges to
match the road segments is also difficult). Since the primary purpose
of the exercise is to accurately assign roads segments to FSAs, I
quickly decided that accuracy mattered most. Moreover, I thought it
would be possible to work on ascetics later. An FSA polygon layer has
a lot of uses in its own right.
> I'm in for the road segment approach. One big reason is that if for
> some
> reason the FSA approach changes or information is updated or an FSA
> added we can create ways to update the data.common edges are good.
Organizing this is going to be tricky. Moreover, there is the
question about what to do with boundaries we need to digitalize
ourselves. The process I've used so far is to:
1. Do an initial clean of the boundary road name file you send. The
cleaning involves mucking with the fields, typically by condensing
the number of fields. You have essentially split things into one word
per field, which makes some sense. However, the file needs to have
three fields: street name (e.g., Main, 45th), street type (e.g.,
Street, Avenue), and street direction (e.g., S, NW). All streets have
a street name, the vast majority have a street type (but not always,
it is not uncommon for street names like Queensway, Kingsway, and
Broadway not to have street type values in the RNF, but in other
instances they do). Typically, most streets do not have street
direction values, however, this varies from city to city (Vancouver,
Edmonton, and Calgary make heavy use of street directions, although
in Vancouver it is mostly only in the W and E directions). In the RNF
the names of these three fields are NAME, TYPE, and DIRECTION. After
doing the initial cleanup of the file, I save it in DBF format. I
should mention that the RNF documentation lists all the potential
values of street type. Some of this is less than obvious. For
instance, there is a Side Road street type (given as SIDERD), but not
a Line Road street type. So the street 4th Line Road has a NAME field
value of 4th Line, and a TYPE value of RD. The other thing I do at
this stage is remove obvious non-street boundaries from the list
(e.g., Mckay Lake, NCC Bike Path).
2. Use a R script to find the unique combinations of street name,
street type, and street direction in the RNF for the area and then
match this to the FSA boundary road list to determine which street
name, type, and direction combinations don't match. In Ottawa, I
couldn't match about 25% of the two lists after my initial cleaning.
Do a number of queries of the RNF from the R command line to figure
out what the deal is with the ones that don't match. There are a
number of things that can differ between the two lists. Common ones
are slight variations in street names (1st Line versus 1 Line, Thomas
Dolan versus Thomas A. Dolan, St. Joseph versus St Joseph),
differences in street types (Huntmar RD versus Huntmar DR), and
missing direction information in the Canada Post FSA boundary
streets. These differences are idiosyncratic, and so will always
involve some amount of hand cleaning. I think that Walter could
probably help us here by creating tool, based on PAGC, to help match
the problem children streets. Based on the queries, Do the next round
of cleaning of the FSA boundary road list.
3. Use another R script to add a flag field to the FSA boundary road
list and merge it into the RNF based on a combined street name, type,
and direction key. This appends a flag to the RNF indicating whether
a road segment is on a street some part of which is a FSA boundary.
4. Based on the FSA boundary flag, use ogr2ogr to create a new road
layer that only includes roads that part of which make up an FSA
boundary.
5. It turns out that the NGD_ID field in the RNF is not actually a
unique road segment ID (I learned this the hard way in my earlier
efforts to augment the RNF). As a result, a unique id (RD_SEG)
attribute is added to the reduced road layer via R.
6. Display the layer in a GIS data viewer/editor (I use QGIS), and
then determine (and write down the RD_SEG values of) the road
segments that are on the boundary of a specific polygon. This is a
labour intensive point and click process, but I can't think of a way
to automate it.
7. Run a third R script to create a flag to indicate whether a road
segment is on the border of the specific FSA polygon being created.
After doing this, use ogr2ogr to create a road layer with only the
boundary road segments for that FSA.
8. Bring the FSA specific road segment layer back into the GIS data
viewer/editor (again, I use QGIS, although uDig may be a better
choice for this) and then hand digitalize missing FSA boundary
vectors. In doing this, identifying the newly digitalized line
segments that will form a common boundary between adjacent FSA
polygons. At this point the FSA polygon isn't really a polygon yet,
but a collection of lines that will form the polygon's edges. Clean
up the vertices of the newly digitalized lines to make sure they
don't overlap one another (I do this with uDig since QGIS doesn't
allow you to edit line segments once they are created).
9. Using ogr2ogr, extract the newly digitalized common boundary lines
and then add them (again via ogr2ogr) to the layer of potential FSA
boundary road segments. This insures that adjacent polygons have the
same common borders for edges that were hand digitalized.
10. Run a fourth and final R script to convert the set of line
segments into a polygon.
11. Repeat steps 6 to 10 for all polygons in an area.
12. Merge the individual polygons together via ogr2ogr.
>
> Alright, I think at this point I need some points of clarification
> on CD
> and CS.... I know what they are in concept but you mention them a lot.
> Is there a corelation between the esitance of an FSA and a CS/CD?
In urban areas a CSD is typically a town (in greater Vancouver,
Vancouver proper is a CSD (Census Subdivision), so is Richmond,
Surrey, North Vancouver, Coquitlam, etc.). All of greater Vancouver
(the GVRD in local parlance) is a CD (Census Division). Having said
this, in the Ottawa area, things are a bit different. The Ottawa CSD
includes Kanata, Gloucester, and so on within its boundaries. In
rural area, what exactly a CSD is hard to say. In BC, towns with a
population over about 5000 tend to be designated as a CSD, as are
vast areas of sparsely populated areas. In general, one can say that
Census Divisions are large areas (which in a large metro area
comprise the entire metro area), and Census Subdivisions are sub-
areas within a CD (often corresponding to medium sized towns to large
cities). The nice thing about them is that their polygons provide
shore lines, and provide an initial way of systematically paring down
the province level RNFs to more manageable pieces.
> What was the 1 or 2 hardest part(s) of matching the FSA lists I
> created
> to the DBF of the populated names? Is this something that we should
> create a process or technique? Someone might be able to script
> something. Almost a sudo standardizer like we were talking about
> before.
> Is this another chicken and egg scenario?
Steps 1 and 2 of my "Make an FSA Polygon" recipe covers this.
Sorry this is long, but it starts to lay out the process for
systematically creating FSA polygons. Look over the "recipe" with the
goal of figuring out way to automate/simplify it.
Dan
More information about the Can_rnf
mailing list