[postgis-users] Batch geocoding (~2 mil addresses)

Johnathan Leppert johnathan.leppert at gmail.com
Tue Nov 29 21:38:20 PST 2011


Hi Ravi,

There is a thread about the tiger geocoder performance from awhile back you
might want to look into. I had to do this before with about 1M addresses
and ended up creating an ec2 instance for each state and sending addresses
with that state to that specific host (using a simple regex to match state
in an address line). Somewhat expensive, but got the job done quickly.

Johnathan

On Tue, Nov 29, 2011 at 9:18 PM, Ravi ada <raviada at dobeyond.com> wrote:

> Hello All,****
>
> ** **
>
> I have been fighting with PostGIS for the last week or two to geocode
> about 2 million addresses. Here are the challenges that I am facing.****
>
> **1)      **I sorted the addresses by zip, address, city to group the
> similar addresses together.****
>
> **2)      **So far I have seen only 50% of rating zero, that means
> perfect match.****
>
> **3)      **Some addresses have Suite# or Apt# in address line 1. Geocode
> function takes longer time for such addresses****
>
> **4)      **Addresses that are on the interstate, may be on the service
> road. These are taking longer to code.****
>
> **5)      **Getting syntax errors on line_segment function when
> processing some addresses that have additional data in parenthesis and a
> period. For example "5727 FM 3097(HORIZON RD.),ROCKWALL,TX,750327786" ****
>
> **6)      **I am using batch updates selecting the addresses per state to
> reduce the size of the batch. I used straight SQL update as mentioned in
> the documentation, and wrote a function to call geo code for each address.
> In both cases the query ran for more than 6 hours and never been able
> finish successfully. If I kill the query, nothing gets updated to the
> address table because the whole transaction is killed and not committed.**
> **
>
> **7)      **I am running this on Ubuntu 11.10 and postgresql 9.1 on a
> virtual machine on 8GB i5 machine. I know the resources are very less for
> such a task.****
>
> ** **
>
> My questions are****
>
> **1)      **How other people are geocoding so many addresses, what is the
> best way to do it, knowing that we may not have fully normalized addresses.
> ****
>
> **2)      **How long it should take to geocode about 2mil addresses.****
>
> **3)      **What does it mean if the rating is non-zero, and more than
> 50. Do we get the distance wring for such addresses?****
>
> **4)      **What is the recommended hardware configuration? We need to
> geocode these addresses only once, and periodically to the delta that we
> get every month. ****
>
> ** **
>
> Thank you so much for all the efforts that you guys are contributing to
> the project. Any help with my questions would greatly be appreciated.****
>
> ** **
>
> Thanks****
>
> Ravi Ada****
>
> ** **
>
> ** **
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>


-- 
*Johnathan*
Software Engineer
San Francisco, California
*Follow me on Twitter: @iamleppert <http://twitter.com/#!/iamleppert>*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20111129/552d946e/attachment.html>


More information about the postgis-users mailing list