RNFdev Thinking through polygon creation

Dan Putler putler at sauder.ubc.ca
Fri Sep 22 12:35:17 EDT 2006


Hi Dave,

> Newbie digitizer eh?... Ohhhh the fun and games we'll have (smile)...
> Concepts are pretty simple. I think we have concluded the first
> symantical discussion that digitizing always brings about...  
> Asthetic VS
> Accuracy. I think we both agree accuracy is more import than a NICE
> looking vector (for the most part). There is an art to digitizing just
> as there apears to be one for geocoding.

I have to admit, the accuracy outlet is based on some quick learning  
about the art of digitalizing. Specifically, making things  
ascetically pleasing is time consuming (shore lines are a  
particularly nasty bit of work, but lining up the polygon edges to  
match the road segments is also difficult). Since the primary purpose  
of the exercise is to accurately assign roads segments to FSAs, I  
quickly decided that accuracy mattered most. Moreover, I thought it  
would be possible to work on ascetics later. An FSA polygon layer has  
a lot of uses in its own right.

> I'm in for the road segment approach. One big reason is that if for  
> some
> reason the FSA approach changes or information is updated or an FSA
> added we can create ways to update the data.common edges are good.

Organizing this is going to be tricky. Moreover, there is the  
question about what to do with boundaries we need to digitalize  
ourselves. The process I've used so far is to:

1. Do an initial clean of the boundary road name file you send. The  
cleaning involves mucking with the fields, typically by condensing  
the number of fields. You have essentially split things into one word  
per field, which makes some sense. However, the file needs to have  
three fields: street name (e.g., Main, 45th), street type (e.g.,  
Street, Avenue), and street direction (e.g., S, NW). All streets have  
a street name, the vast majority have a street type (but not always,  
it is not uncommon for street names like Queensway, Kingsway, and  
Broadway not to have street type values in the RNF, but in other  
instances they do). Typically, most streets do not have street  
direction values, however, this varies from city to city (Vancouver,  
Edmonton, and Calgary make heavy use of street directions, although  
in Vancouver it is mostly only in the W and E directions). In the RNF  
the names of these three fields are NAME, TYPE, and DIRECTION. After  
doing the initial cleanup of the file, I save it in DBF format. I  
should mention that the RNF documentation lists all the potential  
values of street type. Some of this is less than obvious. For  
instance, there is a Side Road street type (given as SIDERD), but not  
a Line Road street type. So the street 4th Line Road has a NAME field  
value of 4th Line, and a TYPE value of RD. The other thing I do at  
this stage is remove obvious non-street boundaries from the list  
(e.g., Mckay Lake, NCC Bike Path).

2. Use a R script to find the unique combinations of street name,  
street type, and street direction in the RNF for the area and then  
match this to the FSA boundary road list to determine which street  
name, type, and direction combinations don't match. In Ottawa, I  
couldn't match about 25% of the two lists after my initial cleaning.  
Do a number of queries of the RNF from the R command line to figure  
out what the deal is with the ones that don't match. There are a  
number of things that can differ between the two lists. Common ones  
are slight variations in street names (1st Line versus 1 Line, Thomas  
Dolan versus Thomas A. Dolan, St. Joseph versus St Joseph),  
differences in street types (Huntmar RD versus Huntmar DR), and  
missing direction information in the Canada Post FSA boundary  
streets. These differences are idiosyncratic, and so will always  
involve some amount of hand cleaning. I think that Walter could  
probably help us here by creating tool, based on PAGC, to help match  
the problem children streets. Based on the queries, Do the next round  
of cleaning of the FSA boundary road list.

3. Use another R script to add a flag field to the FSA boundary road  
list and merge it into the RNF based on a combined street name, type,  
and direction key. This appends a flag to the RNF indicating whether  
a road segment is on a street some part of which is a FSA boundary.

4. Based on the FSA boundary flag, use ogr2ogr to create a new road  
layer that only includes roads that part of which make up an FSA  
boundary.

5. It turns out that the NGD_ID field in the RNF is not actually a  
unique road segment ID (I learned this the hard way in my earlier  
efforts to augment the RNF). As a result, a unique id (RD_SEG)  
attribute is added to the reduced road layer via R.

6. Display the layer in a GIS data viewer/editor (I use QGIS), and  
then determine (and write down the RD_SEG values of) the road  
segments that are on the boundary of a specific polygon. This is a  
labour intensive point and click process, but I can't think of a way  
to automate it.

7. Run a third R script to create a flag to indicate whether a road  
segment is on the border of the specific FSA polygon being created.  
After doing this, use ogr2ogr to create a road layer with only the  
boundary road segments for that FSA.

8. Bring the FSA specific road segment layer back into the GIS data  
viewer/editor (again, I use QGIS, although uDig may be a better  
choice for this) and then hand digitalize missing FSA boundary  
vectors. In doing this, identifying the newly digitalized line  
segments that will form a common boundary between adjacent FSA  
polygons. At this point the FSA polygon isn't really a polygon yet,  
but a collection of lines that will form the polygon's edges. Clean  
up the vertices of the newly digitalized lines to make sure they  
don't overlap one another (I do this with uDig since QGIS doesn't  
allow you to edit line segments once they are created).

9. Using ogr2ogr, extract the newly digitalized common boundary lines  
and then add them (again via ogr2ogr) to the layer of potential FSA  
boundary road segments. This insures that adjacent polygons have the  
same common borders for edges that were hand digitalized.

10. Run a fourth and final R script to convert the set of line  
segments into a polygon.

11. Repeat steps 6 to 10 for all polygons in an area.

12. Merge the individual polygons together via ogr2ogr.

>
> Alright, I think at this point I need some points of clarification  
> on CD
> and CS.... I know what they are in concept but you mention them a lot.
> Is there a corelation between the esitance of an FSA and a CS/CD?

In urban areas a CSD is typically a town (in greater Vancouver,  
Vancouver proper is a CSD (Census Subdivision), so is Richmond,  
Surrey, North Vancouver, Coquitlam, etc.). All of greater Vancouver  
(the GVRD in local parlance) is a CD (Census Division). Having said  
this, in the Ottawa area, things are a bit different. The Ottawa CSD  
includes Kanata, Gloucester, and so on within its boundaries. In  
rural area, what exactly a CSD is hard to say. In BC, towns with a  
population over about 5000 tend to be designated as a CSD, as are  
vast areas of sparsely populated areas. In general, one can say that  
Census Divisions are large areas (which in a large metro area  
comprise the entire metro area), and Census Subdivisions are sub- 
areas within a CD (often corresponding to medium sized towns to large  
cities). The nice thing about them is that their polygons provide  
shore lines, and provide an initial way of systematically paring down  
the province level RNFs to more manageable pieces.

> What was the 1 or 2 hardest part(s) of matching the FSA lists I  
> created
> to the DBF of the populated names? Is this something that we should
> create a process or technique? Someone might be able to script
> something. Almost a sudo standardizer like we were talking about  
> before.
> Is this another chicken and egg scenario?

Steps 1 and 2 of my "Make an FSA Polygon" recipe covers this.

Sorry this is long, but it starts to lay out the process for  
systematically creating FSA polygons. Look over the "recipe" with the  
goal of figuring out way to automate/simplify it.

Dan




More information about the Can_rnf mailing list