[postgis-users] Tigerdata for AZ, AS and VI

Andy Colson andy at squeakycode.net
Tue Dec 13 16:41:30 PST 2011


Oops, my bad, the index on zip made it work well.

The explain analyze is kinda useless.  You're probably not gonna like my next request.

How would you feel about pulling one address out of geo_biz_addr and then copy/pasting each select from the function geocode() and running it by hand?

Said another way, the function might look like:

create function geocode(...) as $$
begin
	select a from ... where;
	select b from somethingElse ...;
end; $$

Get an address from geo_biz_addr, then run:

explain analyze select a from ... where;

Then:
explain analyze select b from somethingElse ...;

You'll have to substitute the arguments by hand.

OH.. Or, there is a plugin to auto-explain queries.  I have no idea if it'll do each select inside a function separately or not.

Where would one find this geocode function?  Is it part of postgis?  Or a different package you downloaded?

-Andy


On 12/13/2011 05:28 PM, Ravi Ada wrote:
> Andy,
> Here is the explain analyze output.
> "Limit  (cost=0.00..14.10 rows=100 width=73) (actual time=4824.392..98929.180
> rows=100 loops=1)"
> "  ->  Index Scan using geo_biz_addr_zip_idx on geo_biz_addr ag
> (cost=0.00..219048.99 rows=1553779 width=73) (actual time=4824.381..98925.304
> rows=100 loops=1)"
> "        Filter: (rating IS NULL)"
> "Total runtime: 98930.371 ms"
>
>
> Here is the output for the query without ORDER BY zip.
> "Limit  (cost=0.00..7.06 rows=100 width=73) (actual time=63022.583..279475.286
> rows=100 loops=1)"
> "  ->  Seq Scan on geo_biz_addr ag  (cost=0.00..109741.62 rows=1553779
> width=73) (actual time=63022.571..279474.529 rows=100 loops=1)"
> "        Filter: (rating IS NULL)"
> "Total runtime: 279475.678 ms"
>
> Surprisingly it took longer without the where clause, that may be because the addresses are scattered around all the states or cities. but in any case, 100 to 300 secs to geocode 100 addresses is too long. I got the work_mem to set to 4GB in postgresql.conf.
>
>
> Thanks
> Ravi Ada
> On Tue, 13 Dec 2011 14:31:34 -0600, Andy Colson wrote
>  > And instead of running the update, try running:
>  >
>  > explain analyze
>  > SELECT ag.id,
>  >
>  > (geocode(ag.address1||','||ag.city||','||ag.state||','||ag.zip)) As geo
>  >        FROM qliq.geo_biz_addr As ag
>  >        WHERE ag.rating IS NULL
>  >        ORDER BY zip
>  >        LIMIT 100
>  >
>  > Also, the order by zip, combined with the limit, means it has to
>  > pull every record, then sort by zip, then pull the first 100.  If
>  > you can drop one or the other it would run faster.
>  >
>  > -Andy
>  >
>  > On 12/13/2011 12:37 PM, Greg Williamson wrote:
>  > > Ravi --
>  > >
>  > > Could you run this with "EXPLAIN ANALYZE ..." and post the results; that
> might give something of a clue as to what issues the planner is encountering.
>  > >
>  > > Greg W.
>  > >
>  > >
>  > >
>  > > ----- Original Message -----
>  > >> From: Ravi ada<raviada at dobeyond.com>
>  > >> To: 'PostGIS Users Discussion'<postgis-users at postgis.refractions.net>
>  > >> Cc:
>  > >> Sent: Monday, December 12, 2011 8:25 PM
>  > >> Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
>  > >>
>  > >> T hanks Steve, That's what I thought too, I ran the
>  > >> 'install_missing_indexes"
>  > >> function, it ran for a few minutes and returned 't'. I am assuming it
>  > >> ran
>  > >> successfully. The performance is still same. I increased the work_mem to 4GB
>  > >> in postgresql.conf. It is still not acceptable.
>  > >>
>  > >> Leo/Regina, anything specific that you want me to verify on my system?
>  > >> Performance is terrible, I can never finish geocoding 3million addresses
>  > >> with this performance.
>  > >>
>  > >> Any help is highly appreciated.
>  > >>
>  > >> Thanks
>  > >> Ravi Ada
>  > >>
>  > >> -----Original Message-----
>  > >> From: postgis-users-bounces at postgis.refractions.net
>  > >> [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Stephen
>  > >> Woodbridge
>  > >> Sent: Monday, December 12, 2011 9:04 PM
>  > >> To: postgis-users at postgis.refractions.net
>  > >> Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
>  > >>
>  > >> Hi Ravi,
>  > >>
>  > >> I do not have this setup on my machine, but I am willing to hazard a guess
>  > >> that you are missing an index, but then I have no idea which
>  > >> one(s) that might be. Leo and Regina are probably the experts in this, so I
>  > >> would look over their past posts on the geocoder. You might also look at the
>  > >> load and prep scripts in svn and see if there is an index there that you do
>  > >> not have on your tables.
>  > >>
>  > >> Regards,
>  > >>     -Steve
>  > >>
>  > >> On 12/12/2011 9:50 PM, Ravi ada wrote:
>  > >>>   In these examples, they used only 2GB memory and 3GHz machine but
>  > >>>   still achieved a blazing fast results. The same queries mentioned in
>  > >>>   the link taking 10 and even 100 times more time to query a particular
>  > >>>   address. I am using a 16GB, 6 Core AMD machine, dedicated to this
>  > >>>   process. I did the tuning on postgresql config file based on the
>  > >>>   recommendations. I am attaching my file here.. Please let me know if
>  > >>>   the tuning parameters look good.
>  > >>> http://postgis.refractions.net/documentation/manual-svn/Geocode.html
>  > >>>
>  > >>>   This query is supposed to take only (61ms) but on my machine is was
>  > >> (734ms).
>  > >>>   SELECT g.rating, ST_X(g.geomout) As lon, ST_Y(g.geomout) As lat,
>  > >>>       (addy).address As stno, (addy).streetname As street,
>  > >>>       (addy).streettypeabbrev As styp, (addy).location As city,
>  > >>>   (addy).stateabbrev As st,(addy).zip
>  > >>>       FROM geocode('75 State Street, Boston MA 02109') As g;
>  > >>>
>  > >>>   Thanks
>  > >>>   Ravi Ada
>  > >>>
>  > >>>   -----Original Message-----
>  > >>>   From: postgis-users-bounces at postgis.refractions.net
>  > >>>   [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
>  > >>>   Ravi ada
>  > >>>   Sent: Monday, December 12, 2011 7:05 PM
>  > >>>   To: 'PostGIS Users Discussion'
>  > >>>   Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
>  > >>>
>  > >>>   Thank you Steve. I downloaded AZ files again and loaded fine but
>  > >>>   others are still the same problem. According to your explanation that
>  > >> should be ok.
>  > >>>
>  > >>>   I got the postgis database loaded for all states now. I have about 3
>  > >>>   mil addresses, may not all be normalized, which I am trying to batch
>  > >>>   geocode them. I am using the example mentioned in this link.
>  > >>> http://www.postgresonline.com/journal/archives/181-pgscript_intro.html
>  > >>>
>  > >>>   I am even using 100 as a batch, my update query is too slow. Its
>  > >>>   updating at
>  > >>>   1500 per hour. That's too slow, I will never be able to finish them.
>  > >>>
>  > >>>   I have 16GB RAM, and 7200 rpm disk partitioned to hold the postgres
>  > >>>   table spaces. I am not sure what makes it run faster. Anybody has done
>  > >>>   so many addresses before? What makes the performance go faster? I am
>  > >>>   attaching the query and query plan here. www.pastie.org <http://www.pastie.org/>/3008194
>  > >>>
>  > >>>   Any help is appreciated.
>  > >>>
>  > >>>   Thanks
>  > >>>   Ravi Ada
>  > >>>
>  > >>>   -----Original Message-----
>  > >>>   From: postgis-users-bounces at postgis.refractions.net
>  > >>>   [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of
>  > >>>   Stephen Woodbridge
>  > >>>   Sent: Monday, December 12, 2011 8:50 AM
>  > >>>   To: postgis-users at postgis.refractions.net
>  > >>>   Subject: Re: [postgis-users] Tigerdata for AZ, AS and VI
>  > >>>
>  > >>>   On 12/12/2011 9:18 AM, Ravi ada wrote:
>  > >>>>   Hello All,
>  > >>>>
>  > >>>>   Has anyone experienced loading tigerdata into postgis database for
>  > >>>>   Arizona, American Samoa and Virgin Islands. I getting
>  > >> "*addr.dbf"
>  > >>>>   cannot find errors. All the other states are loaded fine. I tried to
>  > >>>>   download the shape files again thinking that they might have been
>  > >>>>   corrupted during the transmission, but even after that I am getting
>  > >>>>   the
>  > >>>   same error.
>  > >>>>
>  > >>>>   Any ideas?
>  > >>>
>  > >>>   My download of Tiger has all the *addr* files for Arizona and I
>  > >>>   believe I have accessed them all without a problem.
>  > >>>
>  > >>>
>  > >>>   In general, the *addr* files are optional, and there are none for
>  > >>>   Guam, American Samoa and Virgin Islands.
>  > >>>
>  > >>>   Typically if the county or county equivalent does not have roads with
>  > >>>   address ranges in it, then it will not have any *addr* files. So it is
>  > >>>   possible that a county in Arizona in say the desert might not have any
>  > >>>   address ranges and therefore not have that file, but looking at the
>  > >>>   list of counties in Arizona it looks like they all have those files.
>  > >>>
>  > >>>   -Steve W
>
>
> Thanks,
> Ravi Ada
> 918-630-7381




More information about the postgis-users mailing list