Postgis sequential scan and query performance

Fri Sep 1 08:07:41 PDT 2006

I wanted to let you know that your suggestions worked. I pulled a bunch 
of sql statements out of the log and ran them in psql. The planner is 
now selecting the index for the critical query. This has resulted in 
average execution times of less than 3 seconds. That is a big 
improvement from 65 seconds. The entire map drawing process is now 
running in about 6 seconds. This compares to 120 seconds before the 
optimizations.

I guess that the random_page_cost setting did the trick.

Thanks for your help,

Patrick

Paul Ramsey wrote:
> (Before I say anything, I am impressed with your care in understanding 
> issues of performance which have taken me years to figure out!  Yes, 
> compressing your tuples down to just the rows you need will probably 
> help a bit.)
>
> Another thing to look at is that your PostgreSQL configuration might 
> be sub-optimal in a couple ways:
>
> - Have you adjusted any postgresql.conf parameters at all?
> - Let me assume you have a machine with 2Gb of RAM and are running 
> little on it besides PostgreSQL/PostGIS:
> o Set your shared_buffers to 25000. (About 200M).
> o Set your work_mem and maintenance_work_mem 8-16 times higher than 
> the defaults.
> o Figure out how much memory you have allocated thus far, based on 
> some guesses regarding the number of concurrent users.
> o From that, subtract as much memory as you think all other user 
> processes use.
> o Set your effective_cache_size to the number you are now thinking of
> o Change your random_page_cost to 2.  If you are a fiddler, adjust it 
> up and down until you get "good" results in index usage for a number 
> of use cases (small box, big box)
> o  Turn autovacuum "on", just in case.
> o If you are going to be doing a lot of mucking with tables, consider 
> changing some threshold values to you don't get thrashing, the 
> defaults are for a sort of continuous trickle of change, and GIS 
> people tend to change things in large batches. (update table set foo=bar)
>
> Hope this helps more.
>
> Paul
>
> On 30-Aug-06, at 7:45 PM, Patrick Brannan wrote:
>
>> Paul,
>>
>> Well I re-read the PostGIS documentation - thank you very much - then I
>> tried the following:
>>
>> gisdb=# SET enable_seqscan TO off;
>> SET
>> gisdb=# explain analyze SELECT
>> cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text 
>>
>> from completechain WHERE wkb_geometry &&
>> setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
>> 38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') 
>> );
>>
>> QUERY
>> PLAN
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
>>
>> Index Scan using idx_completechain_wkb_geometry on completechain
>> (cost=0.00..364816.25 rows=90839 width=186) (actual
>> time=37.049..12532.636 rows=103159 loops=1)
>>   Index Cond: (wkb_geometry &&
>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry) 
>>
>> Total runtime: 12891.420 ms
>> (3 rows)
>>
>> gisdb=# explain analyze SELECT
>> cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text 
>>
>> from completechain WHERE wkb_geometry &&
>> setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
>> 38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') 
>> );
>>
>> QUERY
>> PLAN
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
>>
>> Index Scan using idx_completechain_wkb_geometry on completechain
>> (cost=0.00..364816.25 rows=90839 width=186) (actual
>> time=11.988..5204.924 rows=103159 loops=1)
>>   Index Cond: (wkb_geometry &&
>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry) 
>>
>> Total runtime: 5497.049 ms
>> (3 rows)
>>
>> gisdb=# explain analyze SELECT
>> cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text 
>>
>> from completechain WHERE wkb_geometry &&
>> setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
>> 38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') 
>> );
>>
>> QUERY
>> PLAN
>>
>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 
>>
>> Index Scan using idx_completechain_wkb_geometry on completechain
>> (cost=0.00..364816.25 rows=90839 width=186) (actual time=0.088..1915.497
>> rows=103159 loops=1)
>>   Index Cond: (wkb_geometry &&
>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry) 
>>
>> Total runtime: 2205.876 ms
>> (3 rows)
>>
>> Not only was the first run 1/5 the time of the sequential scan run but
>> the runs kept getting better. No matter how many I run with
>> 'enable_seqscan' on the performance never improves. This must be
>> something I don't understand about PostgreSQL caching.
>>
>> Notice that the cost estimate was actually higher on the first one, yet
>> the performance was much better. I'm still working my way through this
>> PostgreSQL stuff so maybe I'm missing something. But what I see leads me
>> to believe that the planner didn't do a very good job on this one. This
>> box covers the St. Louis metropolitan area in a database that is loaded
>> with everything in Missouri and Illinois. So I've got to think that
>> using an index would be better than looking at the whole table.
>>
>> Finally, you are right that my real concern is mapserver. It doesn't
>> have to be lightning fast, but I would sure like to see it better than
>> it is. And the reality is that I am only looking for major highways. So
>> maybe there is a way to get mapserver to build a query that narrows
>> things down before it starts doing the geometry stuff.
>>
>> I'm also thinking that PostgreSQL loads entire rows when it reads pages
>> despite the fact that you may only want 1 or 2 columns. So refactoring
>> the database might make a big difference in disk io. This is Tiger data
>> and I only care about a small portion of it.
>>
>> The table row count, by the way, is 4,203,356. I'm very optimistic about
>> the possible performance given the results. I just have to find a way to
>> get mapserver to capitalize on it.
>>
>> Patrick
>>
>>> You are returning 103159 rows, which is a lot. Unless you have
>>> 100million records in your table, sequence scanning will probably
>>> work faster then index scanning, for retrieving that many rows. Try a
>>> bounding box that only encompasses a few hundred rows. You should see
>>> index scans turn on then.
>>>
>>> By "query" performance, do you mean SQL query, or mapserver "query
>>> mode". Because the latter has some quirks that need some care to avoid.
>>>
>>> P
>>>
>>> On 30-Aug-06, at 1:00 PM, P. S. Brannan wrote:
>>>
>>>> I just imported a bunch of tiger data into postgres. I am using
>>>> mapserver as the frontend. Everything is working fine except for
>>>> the performance. The query performance is terrible and I can't seem
>>>> to get postgresql to use the index i created. Here's what's going on:
>>>>
>>>> Index Creation:
>>>> create index idx_completechain_wkb_geometry on completechain using
>>>> gist(wkb_geometry gist_geometry_ops);
>>>> vacuum analyze;
>>>>
>>>> Vacuum Analyze:
>>>> explain analyze SELECT cfcc::text,fename::text,asbinary
>>>> (force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text from
>>>> completechain
>>>> WHERE wkb_geometry && setSRID('BOX3D(-90.6004589271402
>>>> 38.4732965951534,-89.9998214028184
>>>> 38.8267482378407)'::BOX3D,find_srid
>>>> ('','completechain','wkb_geometry') );
>>>>  QUERY PLAN
>>>> ----------------------------------------------------------------------
>>>> ----------------------------------------------------------------------
>>>> ----------------------------------------------------------------------
>>>> --------------------------
>>>> Seq Scan on completechain  (cost=0.00..275502.54 rows=90839
>>>> width=186) (actual time=10718.879..55686.238 rows=103159 loops=1)
>>>>   Filter: (wkb_geometry &&
>>>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023A
>>>> A47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812F
>>>> D7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
>>>> Total runtime: 55973.870 ms
>>>> (3 rows)
>>>>
>>>> Can anyone tell me what I need to do to get postgres to use the
>>>> index? I'm sure that I'm missing something obvious.
>>>>
>>>> Thanks,
>>>>
>>>> Patrick Brannan
>>>
>>>
>