Convert CP FSA PDF Maps to Text (2 options presented)
Sampson, David
dsampson at NRCan.gc.ca
Mon Sep 25 11:58:29 EDT 2006
Hey Folks,
Here are two approaches for converting Canada Posts publish PDF's
containing the FSA maps to text.
Purpose:
Currently we are extracting information from these PDF's to get a list
of streets that are used as FSA boundaries. FSA's again are the forward
sorting addresses or the first three characters of a canadian postal
Code.
Finding the PDF(s)
All of Canada
1. Download Canada.pdf from here:
http://www.canadapost.ca/common/tools/pg/fsamaps/pdf/Canada.pdf
2. Proceed to Case 1 bellow (case 2 fails due to PDF size)
City by city:
These steps allows you to download city specific PDF's from the TOTAL
COUNTS pages:
1. go to http://www.canadapost.ca/cpc2/addrm/hh/default-e.asp
2. (Click here <JavaScript:popup('','CurOrPend-e.asp')> for direct
access to Householder Counts )
3. select (Current Householder Count Data
<JavaScript:gotodata('current')> )
4. select (Total Points of Call)
5. total points of call shortcut URL
(http://www.canadapost.ca/cpc2/addrm/hh/current/indexp/tpALL-e.asp)
6. find Province of choice
7. Click on coresponding 'M' graphic icon of province
8. Click on coresponding 'M' graphic icon of City
9. proceed to either CASE bellow
Case 1: Local PDF viewer (cut and paste)
2. Using a PDF viewer use shift-click-drag to select all the text. Right
click and choose copy (or use CTRL-C)
3. Open text editor that has good search and replace capabilities (eg
Open Office or MS Word)
4. Paste text without formating and save document.
5. Have fun sorting out the list. Lost of manual work is required.
Case 2: Online PDF to TEXT converter
1. Use Adobe's online conversion tool:
http://www.adobe.com/products/acrobat/access_onlinetools.html
2. enter appropriate URL of city PDF
3. Choose TEXT (HTML Fails due to processing time)
4. enter other fields and comments
5. Click 'Convert' and wait (have a coffee)
Sample URL (notice that I think there is a session ID burried in there)
http://access.adobe.com/access/getStatus.do?jobid=71ce4a9c-53ee-49d8-bc5
5-51dd5dfb2ee1&srcPdfUrl=http://www.canadapost.ca/cpc2/addrm/hh/maps/FSA
/ON20.pdf&convertTo=html&visuallyImpaired=preferhtml&preferHTMLReason=re
quire%20text%20for%20processing&platform=Windows&comments=Thanks%20for%2
0the%20service&starttime=1159200639296
The text resulting in either case appear to be the same, so I have not
seen advantage of one over the other yet. Personal Preference.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/can_rnf/attachments/20060925/f5a098cb/attachment.html
More information about the Can_rnf
mailing list