[postgis-users] Batch geocoding (~2 mil addresses)

Ravi ada raviada at dobeyond.com
Tue Nov 29 06:48:58 PST 2011


Hello All,

 

I have been fighting with PostGIS for the last week or two to geocode about
2 million addresses. Here are the challenges that I am facing.

1)      I sorted the addresses by zip, address, city to group the similar
addresses together.

2)      So far I have seen only 50% of rating zero, that means perfect
match.

3)      Some addresses have Suite# or Apt# in address line 1. Geocode
function takes longer time for such addresses

4)      Addresses that are on the interstate, may be on the service road.
These are taking longer to code.

5)      Getting syntax errors on line_segment function when processing some
addresses that have additional data in parenthesis and a period. For example
"5727 FM 3097(HORIZON RD.),ROCKWALL,TX,750327786" 

6)      I am using batch updates selecting the addresses per state to reduce
the size of the batch. I used straight SQL update as mentioned in the
documentation, and wrote a function to call geo code for each address. In
both cases the query ran for more than 6 hours and never been able finish
successfully. If I kill the query, nothing gets updated to the address table
because the whole transaction is killed and not committed.

7)      I am running this on Ubuntu 11.10 and postgresql 9.1 on a virtual
machine on 8GB i5 machine. I know the resources are very less for such a
task.

 

My questions are

1)      How other people are geocoding so many addresses, what is the best
way to do it, knowing that we may not have fully normalized addresses.

2)      How long it should take to geocode about 2mil addresses.

3)      What does it mean if the rating is non-zero, and more than 50. Do we
get the distance wring for such addresses?

4)      What is the recommended hardware configuration? We need to geocode
these addresses only once, and periodically to the delta that we get every
month. 

 

Thank you so much for all the efforts that you guys are contributing to the
project. Any help with my questions would greatly be appreciated.

 

Thanks

Ravi Ada

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20111129/781513ce/attachment.html>


More information about the postgis-users mailing list