[postgis-users] Tigerdata for AZ, AS and VI

Greg Williamson gwilliamson39 at yahoo.com
Tue Dec 13 17:33:55 PST 2011


Have you run "analyze" recently on this table ? (since the last index build or the last major change in data)

The work_mem setting is fairly meaningless for this -- it applies when building indexes and the like; sort_mem controls how much RAM the system will try to use before it starts using disk; you might try tinkering with that unless it is already large (but remember that each sort in a query uses this much RAM so too aggressive a setting is bad).

HTH,

Greg W.


>________________________________
> From: Ravi Ada <raviada at dobeyond.com>
>To: Andy Colson <andy at squeakycode.net>; PostGIS Users Discussion <postgis-users at postgis.refractions.net> 
>Sent: Tuesday, December 13, 2011 3:28 PM
>Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
> 
>
>Andy, 
>Here is the explain analyze  output.  
>"Limit  (cost=0.00..14.10 rows=100 width=73) (actual 

time=4824.392..98929.180 
>rows=100 

loops=1)" 
>"  ->  Index Scan using geo_biz_addr_zip_idx on geo_biz_addr 

ag 
>(cost=0.00..219048.99 rows=1553779 width=73) (actual 

time=4824.381..98925.304 
>rows=100 

loops=1)" 
>"        Filter: (rating IS 

NULL)" 
>"Total runtime: 98930.371 

ms" 
>
>
>Here is the output for the query without ORDER BY zip.  
>"Limit  (cost=0.00..7.06 rows=100 width=73) (actual 

time=63022.583..279475.286 
>rows=100 

loops=1)" 
>"  ->  Seq Scan on geo_biz_addr ag  (cost=0.00..109741.62 

rows=1553779 
>width=73) (actual time=63022.571..279474.529 rows=100 

loops=1)" 
>"        Filter: (rating IS 

NULL)" 
>"Total runtime: 279475.678 

ms" 
>
>Surprisingly it took longer without the where clause, that may be because the 
addresses are scattered around all the states or cities. but in any case, 100 to 
300 secs to 

geocode 100 addresses is too long. I got the work_mem to set to 4GB in postgresql.conf. 
>
>
>Thanks 
>Ravi Ada 
>On Tue, 13 Dec 2011 14:31:34 -0600, Andy Colson 
wrote 
>> And instead of running the update, try 
running: 
>> 
>> explain 
analyze 
>> SELECT 
ag.id, 
>> 
>> (geocode(ag.address1||','||ag.city||','||ag.state||','||ag.zip)) As 
geo 
>>        FROM qliq.geo_biz_addr As 
ag 
>>        WHERE ag.rating IS 
NULL 
>>        ORDER BY 
zip 
>>        LIMIT 
100 
>> 
>> Also, the order by zip, combined with the limit, means it has to 
>> pull every record, then sort by zip, then pull the first 100.  If 
>> you can drop one or the other it would run 
faster. 
>> 
>> 
-Andy 
>> 
>> On 12/13/2011 12:37 PM, Greg Williamson 
wrote: 
>> > Ravi 
-- 
>> 
> 
>> > Could you run this with "EXPLAIN ANALYZE ..." and post the 
results; 
that 
>might give something of a clue as to what issues the planner is 
encountering. 
>> 
> 
>> > Greg 
W. 
>> 
> 
>> 
> 
>> 
> 
>> > ----- Original Message 
----- 
>> >> From: Ravi 
ada<raviada at dobeyond.com> 
>> >> To: 'PostGIS Users 
Discussion'<postgis-users at postgis.refractions.net> 
>> >> 
Cc: 
>> >> Sent: Monday, December 12, 2011 8:25 
PM 
>> >> Subject: Re: [postgis-users] Tigerdata for AZ, AS and 
VI 
>> 
>> 
>> >> T hanks Steve, That's what I thought too, I ran 
the 
>> >> 
'install_missing_indexes" 
>> >> function, it ran for a few minutes and returned 't'. I am assuming 
it 
>> >> 
ran 
>> >> successfully. The performance is still same. I increased the 
work_mem to 
4GB 
>> >> in postgresql.conf. It is still not 
acceptable. 
>> 
>> 
>> >> Leo/Regina, anything specific that you want me to verify on my 
system? 
>> >> Performance is terrible, I can never finish geocoding 3million 
addresses 
>> >> with this 
performance. 
>> 
>> 
>> >> Any help is highly 
appreciated. 
>> 
>> 
>> >> 
Thanks 
>> >> Ravi 
Ada 
>> 
>> 
>> >> -----Original 
Message----- 
>> >> From: 
postgis-users-bounces at postgis.refractions.net 
>> >> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf 
Of 
Stephen 
>> >> 
Woodbridge 
>> >> Sent: Monday, December 12, 2011 9:04 
PM 
>> >> To: 
postgis-users at postgis.refractions.net 
>> >> Subject: Re: [postgis-users] Tigerdata for AZ, AS and 
VI 
>> 
>> 
>> >> Hi 
Ravi, 
>> 
>> 
>> >> I do not have this setup on my machine, but I am willing to hazard 
a 
guess 
>> >> that you are missing an index, but then I have no idea 
which 
>> >> one(s) that might be. Leo and Regina are probably the experts in 
this, so 
I 
>> >> would look over their past posts on the geocoder. You might also 
look at 
the 
>> >> load and prep scripts in svn and see if there is an index there 
that you 
do 
>> >> not have on your 
tables. 
>> 
>> 
>> >> 
Regards, 
>> >>     
-Steve 
>> 
>> 
>> >> On 12/12/2011 9:50 PM, Ravi ada 
wrote: 
>> >>>   In these examples, they used only 2GB memory and 3GHz 
machine 
but 
>> >>>   still achieved a blazing fast results. The same queries 
mentioned 
in 
>> >>>   the link taking 10 and even 100 times more time to 
query a 
particular 
>> >>>   address. I am using a 16GB, 6 Core AMD machine, 
dedicated to 
this 
>> >>>   process. I did the tuning on postgresql config file 
based on 
the 
>> >>>   recommendations. I am attaching my file here.. Please 
let me know 
if 
>> >>>   the tuning parameters look 
good. 
>> >>>   http://postgis.refractions.net/documentation/manual-svn/Geocode.html
>> 
>>> 
>> >>>   This query is supposed to take only (61ms) but on my 
machine is 
was 
>> >> 
(734ms). 
>> >>>   SELECT g.rating, ST_X(g.geomout) As lon, 
ST_Y(g.geomout) As 
lat, 
>> >>>       (addy).address As stno, (addy).streetname 
As 
street, 
>> >>>       (addy).streettypeabbrev As styp, 
(addy).location As 
city, 
>> >>>   (addy).stateabbrev As 
st,(addy).zip 
>> >>>       FROM geocode('75 State Street, Boston MA 
02109') As 
g; 
>> 
>>> 
>> >>>   
Thanks 
>> >>>   Ravi 
Ada 
>> 
>>> 
>> >>>   -----Original 
Message----- 
>> >>>   From: 
postgis-users-bounces at postgis.refractions.net 
>> >>>   [mailto:postgis-users-bounces at postgis.refractions.net] 
On Behalf 
Of 
>> >>>   Ravi 
ada 
>> >>>   Sent: Monday, December 12, 2011 7:05 
PM 
>> >>>   To: 'PostGIS Users 
Discussion' 
>> >>>   Subject: Re: [postgis-users] Tigerdata for AZ, AS and 
VI 
>> 
>>> 
>> >>>   Thank you Steve. I downloaded AZ files again and loaded 
fine 
but 
>> >>>   others are still the same problem. According to your 
explanation 
that 
>> >> should be 
ok. 
>> 
>>> 
>> >>>   I got the postgis database loaded for all states now. I 
have about 
3 
>> >>>   mil addresses, may not all be normalized, which I am 
trying to 
batch 
>> >>>   geocode them. I am using the example mentioned in this 
link. 
>> >>>   http://www.postgresonline.com/journal/archives/181-pgscript_intro.html
>> 
>>> 
>> >>>   I am even using 100 as a batch, my update query is too 
slow. 
Its 
>> >>>   updating 
at 
>> >>>   1500 per hour. That's too slow, I will never be able to 
finish 
them. 
>> 
>>> 
>> >>>   I have 16GB RAM, and 7200 rpm disk partitioned to hold 
the 
postgres 
>> >>>   table spaces. I am not sure what makes it run faster. 
Anybody has 
done 
>> >>>   so many addresses before? What makes the performance go 
faster? I 
am 
>> >>>   attaching the query and query plan here. www.pastie.org/3008194
>> 
>>> 
>> >>>   Any help is 
appreciated. 
>> 
>>> 
>> >>>   
Thanks 
>> >>>   Ravi 
Ada 
>> 
>>> 
>> >>>   -----Original 
Message----- 
>> >>>   From: 
postgis-users-bounces at postgis.refractions.net 
>> >>>   [mailto:postgis-users-bounces at postgis.refractions.net] 
On Behalf 
Of 
>> >>>   Stephen 
Woodbridge 
>> >>>   Sent: Monday, December 12, 2011 8:50 
AM 
>> >>>   To: 
postgis-users at postgis.refractions.net 
>> >>>   Subject: Re: [postgis-users] Tigerdata for AZ, AS and 
VI 
>> 
>>> 
>> >>>   On 12/12/2011 9:18 AM, Ravi ada 
wrote: 
>> >>>>   Hello 
All, 
>> 
>>>> 
>> >>>>   Has anyone experienced loading tigerdata into 
postgis database 
for 
>> >>>>   Arizona, American Samoa and Virgin Islands. I 
getting 
>> >> 
"*addr.dbf" 
>> >>>>   cannot find errors. All the other states are loaded 
fine. I tried 
to 
>> >>>>   download the shape files again thinking that they 
might have 
been 
>> >>>>   corrupted during the transmission, but even after 
that I am 
getting 
>> >>>>   
the 
>> >>>   same 
error. 
>> 
>>>> 
>> >>>>   Any 
ideas? 
>> 
>>> 
>> >>>   My download of Tiger has all the *addr* files for 
Arizona and 
I 
>> >>>   believe I have accessed them all without a 
problem. 
>> 
>>> 
>> 
>>> 
>> >>>   In general, the *addr* files are optional, and there 
are none 
for 
>> >>>   Guam, American Samoa and Virgin 
Islands. 
>> 
>>> 
>> >>>   Typically if the county or county equivalent does not 
have roads 
with 
>> >>>   address ranges in it, then it will not have any *addr* 
files. So it 
is 
>> >>>   possible that a county in Arizona in say the desert 
might not have 
any 
>> >>>   address ranges and therefore not have that file, but 
looking at 
the 
>> >>>   list of counties in Arizona it looks like they all have 
those 
files. 
>> 
>>> 
>> >>>   -Steve 
W 
>
>
>Thanks, 
>Ravi 
Ada 
>918-630-7381 
> 
>_______________________________________________
>postgis-users mailing list
>postgis-users at postgis.refractions.net
>http://postgis.refractions.net/mailman/listinfo/postgis-users
>
>
>



More information about the postgis-users mailing list