Postgis sequential scan and query performance

Fri Sep 1 11:18:52 EDT 2006

Patrick,

That's great news!

Now, for real speed improvements, figure out how to draw a map that  
does not require 100000 inputs :)  There is clearly some scale work  
to be done, since most of those 1000000 inputs probably resolve to  
little more than a couple pixels -- ie, most of their information is  
being lost in the rasterization process, so is there a way to use  
input with less information instead (lower resolution data...)

Paul

On 1-Sep-06, at 8:07 AM, P. S. Brannan wrote:

> I wanted to let you know that your suggestions worked. I pulled a  
> bunch of sql statements out of the log and ran them in psql. The  
> planner is now selecting the index for the critical query. This has  
> resulted in average execution times of less than 3 seconds. That is  
> a big improvement from 65 seconds. The entire map drawing process  
> is now running in about 6 seconds. This compares to 120 seconds  
> before the optimizations.
>
> I guess that the random_page_cost setting did the trick.
>
> Thanks for your help,
>
> Patrick
>
> Paul Ramsey wrote:
>> (Before I say anything, I am impressed with your care in  
>> understanding issues of performance which have taken me years to  
>> figure out!  Yes, compressing your tuples down to just the rows  
>> you need will probably help a bit.)
>>
>> Another thing to look at is that your PostgreSQL configuration  
>> might be sub-optimal in a couple ways:
>>
>> - Have you adjusted any postgresql.conf parameters at all?
>> - Let me assume you have a machine with 2Gb of RAM and are running  
>> little on it besides PostgreSQL/PostGIS:
>> o Set your shared_buffers to 25000. (About 200M).
>> o Set your work_mem and maintenance_work_mem 8-16 times higher  
>> than the defaults.
>> o Figure out how much memory you have allocated thus far, based on  
>> some guesses regarding the number of concurrent users.
>> o From that, subtract as much memory as you think all other user  
>> processes use.
>> o Set your effective_cache_size to the number you are now thinking of
>> o Change your random_page_cost to 2.  If you are a fiddler, adjust  
>> it up and down until you get "good" results in index usage for a  
>> number of use cases (small box, big box)
>> o  Turn autovacuum "on", just in case.
>> o If you are going to be doing a lot of mucking with tables,  
>> consider changing some threshold values to you don't get  
>> thrashing, the defaults are for a sort of continuous trickle of  
>> change, and GIS people tend to change things in large batches.  
>> (update table set foo=bar)
>>
>> Hope this helps more.
>>
>> Paul
>>
>> On 30-Aug-06, at 7:45 PM, Patrick Brannan wrote:
>>
>>> Paul,
>>>
>>> Well I re-read the PostGIS documentation - thank you very much -  
>>> then I
>>> tried the following:
>>>
>>> gisdb=# SET enable_seqscan TO off;
>>> SET
>>> gisdb=# explain analyze SELECT
>>> cfcc::text,fename::text,asbinary(force_collection(force_2d 
>>> (wkb_geometry)),'NDR'),ogc_fid::text
>>> from completechain WHERE wkb_geometry &&
>>> setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
>>> 38.8267482378407)'::BOX3D,find_srid 
>>> ('','completechain','wkb_geometry') );
>>>
>>> QUERY
>>> PLAN
>>>
>>> -------------------------------------------------------------------- 
>>> -------------------------------------------------------------------- 
>>> -------------------------------------------------------------------- 
>>> ------------------------------------
>>> Index Scan using idx_completechain_wkb_geometry on completechain
>>> (cost=0.00..364816.25 rows=90839 width=186) (actual
>>> time=37.049..12532.636 rows=103159 loops=1)
>>>   Index Cond: (wkb_geometry &&
>>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C43402 
>>> 3AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 
>>> 812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geomet 
>>> ry)
>>> Total runtime: 12891.420 ms
>>> (3 rows)
>>>
>>> gisdb=# explain analyze SELECT
>>> cfcc::text,fename::text,asbinary(force_collection(force_2d 
>>> (wkb_geometry)),'NDR'),ogc_fid::text
>>> from completechain WHERE wkb_geometry &&
>>> setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
>>> 38.8267482378407)'::BOX3D,find_srid 
>>> ('','completechain','wkb_geometry') );
>>>
>>> QUERY
>>> PLAN
>>>
>>> -------------------------------------------------------------------- 
>>> -------------------------------------------------------------------- 
>>> -------------------------------------------------------------------- 
>>> ------------------------------------
>>> Index Scan using idx_completechain_wkb_geometry on completechain
>>> (cost=0.00..364816.25 rows=90839 width=186) (actual
>>> time=11.988..5204.924 rows=103159 loops=1)
>>>   Index Cond: (wkb_geometry &&
>>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C43402 
>>> 3AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 
>>> 812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geomet 
>>> ry)
>>> Total runtime: 5497.049 ms
>>> (3 rows)
>>>
>>> gisdb=# explain analyze SELECT
>>> cfcc::text,fename::text,asbinary(force_collection(force_2d 
>>> (wkb_geometry)),'NDR'),ogc_fid::text
>>> from completechain WHERE wkb_geometry &&
>>> setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
>>> 38.8267482378407)'::BOX3D,find_srid 
>>> ('','completechain','wkb_geometry') );
>>>
>>> QUERY
>>> PLAN
>>>
>>> -------------------------------------------------------------------- 
>>> -------------------------------------------------------------------- 
>>> -------------------------------------------------------------------- 
>>> ------------------------------------
>>> Index Scan using idx_completechain_wkb_geometry on completechain
>>> (cost=0.00..364816.25 rows=90839 width=186) (actual  
>>> time=0.088..1915.497
>>> rows=103159 loops=1)
>>>   Index Cond: (wkb_geometry &&
>>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C43402 
>>> 3AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 
>>> 812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geomet 
>>> ry)
>>> Total runtime: 2205.876 ms
>>> (3 rows)
>>>
>>> Not only was the first run 1/5 the time of the sequential scan  
>>> run but
>>> the runs kept getting better. No matter how many I run with
>>> 'enable_seqscan' on the performance never improves. This must be
>>> something I don't understand about PostgreSQL caching.
>>>
>>> Notice that the cost estimate was actually higher on the first  
>>> one, yet
>>> the performance was much better. I'm still working my way through  
>>> this
>>> PostgreSQL stuff so maybe I'm missing something. But what I see  
>>> leads me
>>> to believe that the planner didn't do a very good job on this  
>>> one. This
>>> box covers the St. Louis metropolitan area in a database that is  
>>> loaded
>>> with everything in Missouri and Illinois. So I've got to think that
>>> using an index would be better than looking at the whole table.
>>>
>>> Finally, you are right that my real concern is mapserver. It doesn't
>>> have to be lightning fast, but I would sure like to see it better  
>>> than
>>> it is. And the reality is that I am only looking for major  
>>> highways. So
>>> maybe there is a way to get mapserver to build a query that narrows
>>> things down before it starts doing the geometry stuff.
>>>
>>> I'm also thinking that PostgreSQL loads entire rows when it reads  
>>> pages
>>> despite the fact that you may only want 1 or 2 columns. So  
>>> refactoring
>>> the database might make a big difference in disk io. This is  
>>> Tiger data
>>> and I only care about a small portion of it.
>>>
>>> The table row count, by the way, is 4,203,356. I'm very  
>>> optimistic about
>>> the possible performance given the results. I just have to find a  
>>> way to
>>> get mapserver to capitalize on it.
>>>
>>> Patrick
>>>
>>>> You are returning 103159 rows, which is a lot. Unless you have
>>>> 100million records in your table, sequence scanning will probably
>>>> work faster then index scanning, for retrieving that many rows.  
>>>> Try a
>>>> bounding box that only encompasses a few hundred rows. You  
>>>> should see
>>>> index scans turn on then.
>>>>
>>>> By "query" performance, do you mean SQL query, or mapserver "query
>>>> mode". Because the latter has some quirks that need some care to  
>>>> avoid.
>>>>
>>>> P
>>>>
>>>> On 30-Aug-06, at 1:00 PM, P. S. Brannan wrote:
>>>>
>>>>> I just imported a bunch of tiger data into postgres. I am using
>>>>> mapserver as the frontend. Everything is working fine except for
>>>>> the performance. The query performance is terrible and I can't  
>>>>> seem
>>>>> to get postgresql to use the index i created. Here's what's  
>>>>> going on:
>>>>>
>>>>> Index Creation:
>>>>> create index idx_completechain_wkb_geometry on completechain using
>>>>> gist(wkb_geometry gist_geometry_ops);
>>>>> vacuum analyze;
>>>>>
>>>>> Vacuum Analyze:
>>>>> explain analyze SELECT cfcc::text,fename::text,asbinary
>>>>> (force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text  
>>>>> from
>>>>> completechain
>>>>> WHERE wkb_geometry && setSRID('BOX3D(-90.6004589271402
>>>>> 38.4732965951534,-89.9998214028184
>>>>> 38.8267482378407)'::BOX3D,find_srid
>>>>> ('','completechain','wkb_geometry') );
>>>>>  QUERY PLAN
>>>>> ------------------------------------------------------------------ 
>>>>> ----
>>>>> ------------------------------------------------------------------ 
>>>>> ----
>>>>> ------------------------------------------------------------------ 
>>>>> ----
>>>>> --------------------------
>>>>> Seq Scan on completechain  (cost=0.00..275502.54 rows=90839
>>>>> width=186) (actual time=10718.879..55686.238 rows=103159 loops=1)
>>>>>   Filter: (wkb_geometry &&
>>>>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434 
>>>>> 023A
>>>>> A47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE 
>>>>> 812F
>>>>> D7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry 
>>>>> )
>>>>> Total runtime: 55973.870 ms
>>>>> (3 rows)
>>>>>
>>>>> Can anyone tell me what I need to do to get postgres to use the
>>>>> index? I'm sure that I'm missing something obvious.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Patrick Brannan
>>>>
>>>>
>>
>