<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word" xmlns:m =
"http://schemas.microsoft.com/office/2004/12/omml"><HEAD>
<META content="text/html; charset=us-ascii" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 9.00.8112.16437">
<STYLE>@font-face {
font-family: Calibri;
}
@font-face {
font-family: Tahoma;
}
@page WordSection1 {size: 8.5in 11.0in; margin: 1.0in 1.0in 1.0in 1.0in; }
P.MsoNormal {
MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"; FONT-SIZE: 12pt
}
LI.MsoNormal {
MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"; FONT-SIZE: 12pt
}
DIV.MsoNormal {
MARGIN: 0in 0in 0pt; FONT-FAMILY: "Times New Roman","serif"; FONT-SIZE: 12pt
}
A:link {
COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlink {
COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
A:visited {
COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlinkFollowed {
COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
P {
FONT-FAMILY: "Times New Roman","serif"; MARGIN-LEFT: 0in; FONT-SIZE: 12pt; MARGIN-RIGHT: 0in; mso-style-priority: 99; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto
}
SPAN.EmailStyle18 {
FONT-FAMILY: "Calibri","sans-serif"; COLOR: #1f497d; mso-style-type: personal-reply
}
.MsoChpDefault {
FONT-FAMILY: "Calibri","sans-serif"; mso-style-type: export-only
}
DIV.WordSection1 {
page: WordSection1
}
</STYLE>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></HEAD>
<BODY lang=EN-US vLink=purple link=blue>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial>Ravi,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial>If you have pgAdmin, you might want to try something like
pgScript. Unlike plpgsql it commits on each update so doesn't suffer from
the same issue as a stored proc.</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial><A
href="http://www.postgresonline.com/journal/archives/181-pgscript_intro.html">http://www.postgresonline.com/journal/archives/181-pgscript_intro.html</A></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial>Hope that helps,</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial>Regina</FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial><A
href="http://www.postgis.us">http://www.postgis.us</A></FONT></SPAN></DIV>
<DIV dir=ltr align=left><SPAN class=159303614-30112011><FONT color=#0000ff
size=2 face=Arial></FONT></SPAN> </DIV><BR>
<BLOCKQUOTE
style="BORDER-LEFT: #0000ff 2px solid; PADDING-LEFT: 5px; MARGIN-LEFT: 5px; MARGIN-RIGHT: 0px"
dir=ltr>
<DIV dir=ltr lang=en-us class=OutlookMessageHeader align=left>
<HR tabIndex=-1>
<FONT size=2 face=Tahoma><B>From:</B>
postgis-users-bounces@postgis.refractions.net
[mailto:postgis-users-bounces@postgis.refractions.net] <B>On Behalf Of
</B>Ravi ada<BR><B>Sent:</B> Wednesday, November 30, 2011 7:06
AM<BR><B>To:</B> 'PostGIS Users Discussion'<BR><B>Subject:</B> Re:
[postgis-users] Batch geocoding (~2 mil addresses)<BR></FONT><BR></DIV>
<DIV></DIV>
<DIV class=WordSection1>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt">Thanks
Jonathan for the quick reply.<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt"><o:p> </o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt">I
am doing the similar thing as you did except doing it locally. I got the
address divided into 4 columns, address, city, state and zip. I was able to
query for each state and run the geocode. I was never successful running the
complete state. The query takes forever to finish even though the query plan
shows it is using index scan etc. I even wrote a psql function to pass
one address at a time to get the geocode, this one also could not complete
because I could not commit each address as I get the gecode and ended up
running out of resources.<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt">How
you were able to geocode by state, did you write any scripts or psql
functions? If yes, do you mind sharing? I am having hard time geocoding these
addresses.<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt">Appreciate
any help is extended.<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt"><o:p> </o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt">Thanks,<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt">Ravi
Ada<o:p></o:p></SPAN></P>
<P class=MsoNormal><SPAN
style="FONT-FAMILY: 'Calibri','sans-serif'; COLOR: #1f497d; FONT-SIZE: 11pt"><o:p> </o:p></SPAN></P>
<P class=MsoNormal><B><SPAN
style="FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt">From:</SPAN></B><SPAN
style="FONT-FAMILY: 'Tahoma','sans-serif'; FONT-SIZE: 10pt">
postgis-users-bounces@postgis.refractions.net
[mailto:postgis-users-bounces@postgis.refractions.net] <B>On Behalf Of
</B>Johnathan Leppert<BR><B>Sent:</B> Tuesday, November 29, 2011 11:38
PM<BR><B>To:</B> PostGIS Users Discussion<BR><B>Subject:</B> Re:
[postgis-users] Batch geocoding (~2 mil addresses)<o:p></o:p></SPAN></P>
<P class=MsoNormal><o:p> </o:p></P>
<P class=MsoNormal>Hi Ravi,<o:p></o:p></P>
<DIV>
<P class=MsoNormal><o:p> </o:p></P></DIV>
<DIV>
<P class=MsoNormal>There is a thread about the tiger geocoder performance from
awhile back you might want to look into. I had to do this before with about 1M
addresses and ended up creating an ec2 instance for each state and sending
addresses with that state to that specific host (using a simple regex to match
state in an address line). Somewhat expensive, but got the job done
quickly.<o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><o:p> </o:p></P></DIV>
<DIV>
<P class=MsoNormal>Johnathan<o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><o:p> </o:p></P>
<DIV>
<P class=MsoNormal>On Tue, Nov 29, 2011 at 9:18 PM, Ravi ada <<A
href="mailto:raviada@dobeyond.com">raviada@dobeyond.com</A>>
wrote:<o:p></o:p></P>
<DIV>
<DIV>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal>Hello All,<o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal> <o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal>I have been fighting with PostGIS for the last week or two to
geocode about 2 million addresses. Here are the challenges that I am
facing.<o:p></o:p></P>
<P>1)<SPAN style="FONT-SIZE: 7pt"> </SPAN>I
sorted the addresses by zip, address, city to group the similar addresses
together.<o:p></o:p></P>
<P>2)<SPAN style="FONT-SIZE: 7pt"> </SPAN>So far
I have seen only 50% of rating zero, that means perfect match.<o:p></o:p></P>
<P>3)<SPAN style="FONT-SIZE: 7pt"> </SPAN>Some
addresses have Suite# or Apt# in address line 1. Geocode function takes longer
time for such addresses<o:p></o:p></P>
<P>4)<SPAN style="FONT-SIZE: 7pt">
</SPAN>Addresses that are on the interstate, may be on the service road. These
are taking longer to code.<o:p></o:p></P>
<P>5)<SPAN style="FONT-SIZE: 7pt">
</SPAN>Getting syntax errors on line_segment function when processing some
addresses that have additional data in parenthesis and a period. For example
"5727 FM 3097(HORIZON RD.),ROCKWALL,TX,750327786" <o:p></o:p></P>
<P>6)<SPAN style="FONT-SIZE: 7pt"> </SPAN>I am
using batch updates selecting the addresses per state to reduce the size of
the batch. I used straight SQL update as mentioned in the documentation, and
wrote a function to call geo code for each address. In both cases the query
ran for more than 6 hours and never been able finish successfully. If I kill
the query, nothing gets updated to the address table because the whole
transaction is killed and not committed.<o:p></o:p></P>
<P>7)<SPAN style="FONT-SIZE: 7pt"> </SPAN>I am
running this on Ubuntu 11.10 and postgresql 9.1 on a virtual machine on 8GB i5
machine. I know the resources are very less for such a task.<o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal> <o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal>My questions are<o:p></o:p></P>
<P>1)<SPAN style="FONT-SIZE: 7pt"> </SPAN>How
other people are geocoding so many addresses, what is the best way to do it,
knowing that we may not have fully normalized addresses.<o:p></o:p></P>
<P>2)<SPAN style="FONT-SIZE: 7pt"> </SPAN>How
long it should take to geocode about 2mil addresses.<o:p></o:p></P>
<P>3)<SPAN style="FONT-SIZE: 7pt"> </SPAN>What
does it mean if the rating is non-zero, and more than 50. Do we get the
distance wring for such addresses?<o:p></o:p></P>
<P>4)<SPAN style="FONT-SIZE: 7pt"> </SPAN>What
is the recommended hardware configuration? We need to geocode these addresses
only once, and periodically to the delta that we get every month.
<o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal> <o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal>Thank you so much for all the efforts that you guys are
contributing to the project. Any help with my questions would greatly be
appreciated.<o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal> <o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal>Thanks<o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal>Ravi Ada<o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal> <o:p></o:p></P>
<P style="mso-margin-top-alt: auto; mso-margin-bottom-alt: auto"
class=MsoNormal> <o:p></o:p></P></DIV></DIV>
<P style="MARGIN-BOTTOM: 12pt"
class=MsoNormal><BR>_______________________________________________<BR>postgis-users
mailing list<BR><A
href="mailto:postgis-users@postgis.refractions.net">postgis-users@postgis.refractions.net</A><BR><A
href="http://postgis.refractions.net/mailman/listinfo/postgis-users"
target=_blank>http://postgis.refractions.net/mailman/listinfo/postgis-users</A><o:p></o:p></P></DIV>
<P class=MsoNormal><BR><BR clear=all><o:p></o:p></P>
<DIV>
<P class=MsoNormal><o:p> </o:p></P></DIV>
<P class=MsoNormal>-- <BR><I><SPAN
style="COLOR: #cccccc">Johnathan</SPAN></I><o:p></o:p></P>
<DIV>
<P class=MsoNormal><SPAN style="COLOR: #cccccc">Software
Engineer</SPAN><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><SPAN style="COLOR: #cccccc">San
Francisco, California</SPAN><o:p></o:p></P></DIV>
<DIV>
<P class=MsoNormal><I><SPAN style="COLOR: #cccccc"><A
href="http://twitter.com/#!/iamleppert" target=_blank>Follow me on Twitter:
@iamleppert</A></SPAN></I><o:p></o:p></P></DIV>
<P class=MsoNormal><o:p> </o:p></P></DIV></DIV></BLOCKQUOTE></BODY></HTML>