<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri","sans-serif";}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri","sans-serif";
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-family:"Calibri","sans-serif";}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:282736773;
        mso-list-type:hybrid;
        mso-list-template-ids:-256345486 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1
        {mso-list-id:1748068796;
        mso-list-type:hybrid;
        mso-list-template-ids:584511846 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l1:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l1:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l1:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal>Hello All,<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I have been fighting with PostGIS for the last week or two to geocode about 2 million addresses. Here are the challenges that I am facing.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>1)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>I sorted the addresses by zip, address, city to group the similar addresses together.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>2)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>So far I have seen only 50% of rating zero, that means perfect match.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>3)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>Some addresses have Suite# or Apt# in address line 1. Geocode function takes longer time for such addresses<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>4)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>Addresses that are on the interstate, may be on the service road. These are taking longer to code.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>5)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>Getting syntax errors on line_segment function when processing some addresses that have additional data in parenthesis and a period. For example "5727 FM 3097(HORIZON RD.),ROCKWALL,TX,750327786" <o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>6)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>I am using batch updates selecting the addresses per state to reduce the size of the batch. I used straight SQL update as mentioned in the documentation, and wrote a function to call geo code for each address. In both cases the query ran for more than 6 hours and never been able finish successfully. If I kill the query, nothing gets updated to the address table because the whole transaction is killed and not committed.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>7)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>I am running this on Ubuntu 11.10 and postgresql 9.1 on a virtual machine on 8GB i5 machine. I know the resources are very less for such a task.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>My questions are<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>1)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>How other people are geocoding so many addresses, what is the best way to do it, knowing that we may not have fully normalized addresses.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>2)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>How long it should take to geocode about 2mil addresses.<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>3)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>What does it mean if the rating is non-zero, and more than 50. Do we get the distance wring for such addresses?<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l1 level1 lfo2'><![if !supportLists]><span style='mso-list:Ignore'>4)<span style='font:7.0pt "Times New Roman"'>      </span></span><![endif]>What is the recommended hardware configuration? We need to geocode these addresses only once, and periodically to the delta that we get every month. <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Thank you so much for all the efforts that you guys are contributing to the project. Any help with my questions would greatly be appreciated.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Thanks<o:p></o:p></p><p class=MsoNormal>Ravi Ada<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p></div></body></html>