[postgis-users] Tigerdata for AZ, AS and VI

Tue Dec 13 20:02:49 PST 2011

Thanks Steve.

Here are the values from the postgresql.conf

max_connections = 140
shared_buffers = 2GB
temp_buffers = 8MB
work_mem = 4GB
maintenance_work_mem = 1GB
wal_buffers = 8MB
checkpoint_segments = 128
effective_cache_size = 6GB
cpu_tuple_cost = 0.0030
cpu_index_tuple_cost = 0.0010
cpu_operator_cost = 0.0005
fsync = off
checkpoint_timeout = 1h

I just don't understand why the geocode function takes so long to return the
coordinates. I am sure some of you on this list might have done the batch
geocoding millions of addresses. I may be missing just a simple configuration
which might make a whole lot of difference in the speed. I don't know what it
is. I am following the examples exactly from this link
(http://postgis.refractions.net/documentation/manual-svn/Geocode.html)

If someone is familiar with the problem willing to help me using GoTo Meeting
connection to my machine, I can arrange that too. I just have to move along
with my project and meet the deadlines. I am already delayed, everybody in my
team asking me for this everyday.

Thanks,
Ravi Ada

On Tue, 13 Dec 2011 21:25:54 -0500, Stephen Woodbridge wrote
> On 12/13/2011 8:33 PM, Greg Williamson wrote:
> > Have you run "analyze" recently on this table ? (since the last index
> > build or the last major change in data)
> >
> > The work_mem setting is fairly meaningless for this -- it applies
> > when building indexes and the like; sort_mem controls how much RAM
> > the system will try to use before it starts using disk; you might try
> > tinkering with that unless it is already large (but remember that
> > each sort in a query uses this much RAM so too aggressive a setting
> > is bad).
> 
> What is:
>    shared_buffers = ...
> set to? this is the one that needs to be set. You should google: 
> "postgresql tuning" and read how to set the postgresql.conf file. 
> You may also need to change your kernel parameters as you increase 
> the shared_buffers.
> 
> setting work_mem to a large value will not help much. You need to 
> read what each of the parameters do and then set them appropriately.
> 
> -Steve
> 
> > HTH,
> >
> > Greg W.
> >
> >
> >> ________________________________ From: Ravi
> >> Ada<raviada at dobeyond.com> To: Andy Colson<andy at squeakycode.net>;
> >> PostGIS Users Discussion<postgis-users at postgis.refractions.net>
> >> Sent: Tuesday, December 13, 2011 3:28 PM Subject: Re:
> >> [postgis-users] Tigerdata for AZ, AS and VI
> >>
> >>
> >> Andy, Here is the explain analyze  output. "Limit
> >> (cost=0.00..14.10 rows=100 width=73) (actual
> >
> > time=4824.392..98929.180
> >> rows=100
> >
> > loops=1)"
> >> "  ->    Index Scan using geo_biz_addr_zip_idx on geo_biz_addr
> >
> > ag
> >> (cost=0.00..219048.99 rows=1553779 width=73) (actual
> >
> > time=4824.381..98925.304
> >> rows=100
> >
> > loops=1)"
> >> "        Filter: (rating IS
> >
> > NULL)"
> >> "Total runtime: 98930.371
> >
> > ms"
> >>
> >>
> >> Here is the output for the query without ORDER BY zip. "Limit
> >> (cost=0.00..7.06 rows=100 width=73) (actual
> >
> > time=63022.583..279475.286
> >> rows=100
> >
> > loops=1)"
> >> "  ->    Seq Scan on geo_biz_addr ag  (cost=0.00..109741.62
> >
> > rows=1553779
> >> width=73) (actual time=63022.571..279474.529 rows=100
> >
> > loops=1)"
> >> "        Filter: (rating IS
> >
> > NULL)"
> >> "Total runtime: 279475.678
> >
> > ms"
> >>
> >> Surprisingly it took longer without the where clause, that may be
> >> because the
> > addresses are scattered around all the states or cities. but in any
> > case, 100 to 300 secs to
> >
> > geocode 100 addresses is too long. I got the work_mem to set to 4GB
> > in postgresql.conf.
> >>
> >>
> >> Thanks Ravi Ada On Tue, 13 Dec 2011 14:31:34 -0600, Andy Colson
> > wrote
> >>> And instead of running the update, try
> > running:
> >>>
> >>> explain
> > analyze
> >>> SELECT
> > ag.id,
> >>>
> >>> (geocode(ag.address1||','||ag.city||','||ag.state||','||ag.zip))
> >>> As
> > geo
> >>> FROM qliq.geo_biz_addr As
> > ag
> >>> WHERE ag.rating IS
> > NULL
> >>> ORDER BY
> > zip
> >>> LIMIT
> > 100
> >>>
> >>> Also, the order by zip, combined with the limit, means it has to
> >>> pull every record, then sort by zip, then pull the first 100.
> >>> If you can drop one or the other it would run
> > faster.
> >>>
> >>>
> > -Andy
> >>>
> >>> On 12/13/2011 12:37 PM, Greg Williamson
> > wrote:
> >>>> Ravi
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users

Thanks,
Ravi Ada
918-630-7381