[postgis-users] Tigerdata for AZ, AS and VI

Stephen Woodbridge woodbri at swoodbridge.com
Tue Dec 13 20:54:38 PST 2011


You have 16 GB of mem so set your shared_buffer=12GB
This will probably fail to start because your kernel may not have SHMMAX 
set high enough, so you will need to set it higher:

You can see the current settings with:
ipcs -l

You can change it by editing:
/etc/sysctl.conf

then reload the new paramters with:
sysctl -p

and check it again with:
ipcs -l

and finally restart postgres
/etc/init.d/postgresql-<version> restart

and then try it again. You might need to process a few queries for it to 
load up the page cache after you restart before you start seeing any 
performance increases.

Also you should sort you addresses by zipcode before you geocode them so 
that the data you need is in the page cache from the last record you 
geocoded.

-Steve

On 12/13/2011 11:02 PM, Ravi Ada wrote:
> Thanks Steve.
>
> Here are the values from the postgresql.conf
>
> max_connections = 140
> shared_buffers = 2GB
> temp_buffers = 8MB
> work_mem = 4GB
> maintenance_work_mem = 1GB
> wal_buffers = 8MB
> checkpoint_segments = 128
> effective_cache_size = 6GB
> cpu_tuple_cost = 0.0030
> cpu_index_tuple_cost = 0.0010
> cpu_operator_cost = 0.0005
> fsync = off
> checkpoint_timeout = 1h
>
> I just don't understand why the geocode function takes so long to return the
> coordinates. I am sure some of you on this list might have done the batch
> geocoding millions of addresses. I may be missing just a simple configuration
> which might make a whole lot of difference in the speed. I don't know what it
> is. I am following the examples exactly from this link
> (http://postgis.refractions.net/documentation/manual-svn/Geocode.html)
>
> If someone is familiar with the problem willing to help me using GoTo Meeting
> connection to my machine, I can arrange that too. I just have to move along
> with my project and meet the deadlines. I am already delayed, everybody in my
> team asking me for this everyday.
>
>
> Thanks,
> Ravi Ada
>
>
>
> On Tue, 13 Dec 2011 21:25:54 -0500, Stephen Woodbridge wrote
>> On 12/13/2011 8:33 PM, Greg Williamson wrote:
>>> Have you run "analyze" recently on this table ? (since the last index
>>> build or the last major change in data)
>>>
>>> The work_mem setting is fairly meaningless for this -- it applies
>>> when building indexes and the like; sort_mem controls how much RAM
>>> the system will try to use before it starts using disk; you might try
>>> tinkering with that unless it is already large (but remember that
>>> each sort in a query uses this much RAM so too aggressive a setting
>>> is bad).
>>
>> What is:
>>     shared_buffers = ...
>> set to? this is the one that needs to be set. You should google:
>> "postgresql tuning" and read how to set the postgresql.conf file.
>> You may also need to change your kernel parameters as you increase
>> the shared_buffers.
>>
>> setting work_mem to a large value will not help much. You need to
>> read what each of the parameters do and then set them appropriately.
>>
>> -Steve
>>
>>> HTH,
>>>
>>> Greg W.
>>>
>>>
>>>> ________________________________ From: Ravi
>>>> Ada<raviada at dobeyond.com>  To: Andy Colson<andy at squeakycode.net>;
>>>> PostGIS Users Discussion<postgis-users at postgis.refractions.net>
>>>> Sent: Tuesday, December 13, 2011 3:28 PM Subject: Re:
>>>> [postgis-users] Tigerdata for AZ, AS and VI
>>>>
>>>>
>>>> Andy, Here is the explain analyze  output. "Limit
>>>> (cost=0.00..14.10 rows=100 width=73) (actual
>>>
>>> time=4824.392..98929.180
>>>> rows=100
>>>
>>> loops=1)"
>>>> "  ->     Index Scan using geo_biz_addr_zip_idx on geo_biz_addr
>>>
>>> ag
>>>> (cost=0.00..219048.99 rows=1553779 width=73) (actual
>>>
>>> time=4824.381..98925.304
>>>> rows=100
>>>
>>> loops=1)"
>>>> "        Filter: (rating IS
>>>
>>> NULL)"
>>>> "Total runtime: 98930.371
>>>
>>> ms"
>>>>
>>>>
>>>> Here is the output for the query without ORDER BY zip. "Limit
>>>> (cost=0.00..7.06 rows=100 width=73) (actual
>>>
>>> time=63022.583..279475.286
>>>> rows=100
>>>
>>> loops=1)"
>>>> "  ->     Seq Scan on geo_biz_addr ag  (cost=0.00..109741.62
>>>
>>> rows=1553779
>>>> width=73) (actual time=63022.571..279474.529 rows=100
>>>
>>> loops=1)"
>>>> "        Filter: (rating IS
>>>
>>> NULL)"
>>>> "Total runtime: 279475.678
>>>
>>> ms"
>>>>
>>>> Surprisingly it took longer without the where clause, that may be
>>>> because the
>>> addresses are scattered around all the states or cities. but in any
>>> case, 100 to 300 secs to
>>>
>>> geocode 100 addresses is too long. I got the work_mem to set to 4GB
>>> in postgresql.conf.
>>>>
>>>>
>>>> Thanks Ravi Ada On Tue, 13 Dec 2011 14:31:34 -0600, Andy Colson
>>> wrote
>>>>> And instead of running the update, try
>>> running:
>>>>>
>>>>> explain
>>> analyze
>>>>> SELECT
>>> ag.id,
>>>>>
>>>>> (geocode(ag.address1||','||ag.city||','||ag.state||','||ag.zip))
>>>>> As
>>> geo
>>>>> FROM qliq.geo_biz_addr As
>>> ag
>>>>> WHERE ag.rating IS
>>> NULL
>>>>> ORDER BY
>>> zip
>>>>> LIMIT
>>> 100
>>>>>
>>>>> Also, the order by zip, combined with the limit, means it has to
>>>>> pull every record, then sort by zip, then pull the first 100.
>>>>> If you can drop one or the other it would run
>>> faster.
>>>>>
>>>>>
>>> -Andy
>>>>>
>>>>> On 12/13/2011 12:37 PM, Greg Williamson
>>> wrote:
>>>>>> Ravi
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at postgis.refractions.net
>> http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
> Thanks,
> Ravi Ada
> 918-630-7381
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users




More information about the postgis-users mailing list