Polygon creation (step 1)

Sampson, David dsampson at NRCan.gc.ca
Fri Sep 22 13:19:46 EDT 2006


 I thought of breaking the process apart into smaller chunks. Let me
know if this is annoying.

Original thread Subject: Re: RNFdev Thinking through polygon creation


=========
1. Do an initial clean of the boundary road name file you send. The
cleaning involves mucking with the fields, typically by condensing the
number of fields. You have essentially split things into one word per
field, which makes some sense. However, the file needs to have three
fields: street name (e.g., Main, 45th), street type (e.g., Street,
Avenue), and street direction (e.g., S, NW). All streets have a street
name, the vast majority have a street type (but not always, it is not
uncommon for street names like Queensway, Kingsway, and Broadway not to
have street type values in the RNF, but in other instances they do).
Typically, most streets do not have street direction values, however,
this varies from city to city (Vancouver, Edmonton, and Calgary make
heavy use of street directions, although in Vancouver it is mostly only
in the W and E directions). In the RNF the names of these three fields
are NAME, TYPE, and DIRECTION. After doing the initial cleanup of the
file, I save it in DBF format. I should mention that the RNF
documentation lists all the potential values of street type. Some of
this is less than obvious. For instance, there is a Side Road street
type (given as SIDERD), but not a Line Road street type. So the street
4th Line Road has a NAME field value of 4th Line, and a TYPE value of
RD. The other thing I do at this stage is remove obvious non-street
boundaries from the list (e.g., Mckay Lake, NCC Bike Path).

2. ...I think that Walter could probably help us here by creating tool,
based on PAGC, to help match the problem children streets. Based on the
queries, Do the next round of cleaning of the FSA boundary road list...
============

I know one of the future products is a street name standardizer, but
what about a sudo one based on what we already have.

1. We have a given list of streets from the RNF files
2. What percentage match would we have? Most of them? maybe atleast on a
CD/CSD level?
3. We have a dogs breakfast of FSA boundry roads that gets mashed up
from PDF extraction.
4. We can build either multi field tables (based on space delimiter,
hense my table) or a single field table (concatenate all fields with no
delimiter)
5. parse single/each field to match to RNF index. Perform probability
check and ask for human intervention when required or unmatched. Again
on a CSD level?
5.5 use the parses to delimate text strings into appropriate fields.
6. use PAGC gazateer for ST. Joseph Or ST Joseph prefix distinctions
(Huge Issue come time when we do Quebec)
7. Use PAGC gazeteer for RD/ROAD suffix distinctions or 1st line / 1
line numbering issues 
8. Recognize key prefixes/sufix/type (already in PAGC)
9. dump list of unmatched names.
10. manualy repeat for valid dumped names. Ditch the lakes and rivers.

I think the secret is to work on local chunks and then take results to
build on wider scale until we have the province.

I know we talked about this before and the code exists or could exist
but at least this way we have a clean road list to work with. Let me
know if there is logic I missed again. Definitely a coding issue for
list parsing and sorting... But seems to be a simple string parse and
match issue.

I shall try to clean up the next list.




More information about the Can_rnf mailing list