Postgis sequential scan and query performance

Patrick Brannan brannan at BRANNANSOFTWARE.COM
Wed Aug 30 22:45:43 EDT 2006


Paul,

Well I re-read the PostGIS documentation - thank you very much - then I
tried the following:

gisdb=# SET enable_seqscan TO off;
SET
gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') );

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual
time=37.049..12532.636 rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 12891.420 ms
(3 rows)

gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') );

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual
time=11.988..5204.924 rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 5497.049 ms
(3 rows)

gisdb=# explain analyze SELECT
cfcc::text,fename::text,asbinary(force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text
from completechain WHERE wkb_geometry &&
setSRID('BOX3D(-90.6004589271402 38.4732965951534,-89.9998214028184
38.8267482378407)'::BOX3D,find_srid('','completechain','wkb_geometry') );

QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using idx_completechain_wkb_geometry on completechain
(cost=0.00..364816.25 rows=90839 width=186) (actual time=0.088..1915.497
rows=103159 loops=1)
  Index Cond: (wkb_geometry &&
'0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023AA47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812FD7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
Total runtime: 2205.876 ms
(3 rows)

Not only was the first run 1/5 the time of the sequential scan run but
the runs kept getting better. No matter how many I run with
'enable_seqscan' on the performance never improves. This must be
something I don't understand about PostgreSQL caching.

Notice that the cost estimate was actually higher on the first one, yet
the performance was much better. I'm still working my way through this
PostgreSQL stuff so maybe I'm missing something. But what I see leads me
to believe that the planner didn't do a very good job on this one. This
box covers the St. Louis metropolitan area in a database that is loaded
with everything in Missouri and Illinois. So I've got to think that
using an index would be better than looking at the whole table.

Finally, you are right that my real concern is mapserver. It doesn't
have to be lightning fast, but I would sure like to see it better than
it is. And the reality is that I am only looking for major highways. So
maybe there is a way to get mapserver to build a query that narrows
things down before it starts doing the geometry stuff.

I'm also thinking that PostgreSQL loads entire rows when it reads pages
despite the fact that you may only want 1 or 2 columns. So refactoring
the database might make a big difference in disk io. This is Tiger data
and I only care about a small portion of it.

The table row count, by the way, is 4,203,356. I'm very optimistic about
the possible performance given the results. I just have to find a way to
get mapserver to capitalize on it.

Patrick

> You are returning 103159 rows, which is a lot. Unless you have
> 100million records in your table, sequence scanning will probably
> work faster then index scanning, for retrieving that many rows. Try a
> bounding box that only encompasses a few hundred rows. You should see
> index scans turn on then.
>
> By "query" performance, do you mean SQL query, or mapserver "query
> mode". Because the latter has some quirks that need some care to avoid.
>
> P
>
> On 30-Aug-06, at 1:00 PM, P. S. Brannan wrote:
>
>> I just imported a bunch of tiger data into postgres. I am using
>> mapserver as the frontend. Everything is working fine except for
>> the performance. The query performance is terrible and I can't seem
>> to get postgresql to use the index i created. Here's what's going on:
>>
>> Index Creation:
>> create index idx_completechain_wkb_geometry on completechain using
>> gist(wkb_geometry gist_geometry_ops);
>> vacuum analyze;
>>
>> Vacuum Analyze:
>> explain analyze SELECT cfcc::text,fename::text,asbinary
>> (force_collection(force_2d(wkb_geometry)),'NDR'),ogc_fid::text from
>> completechain
>> WHERE wkb_geometry && setSRID('BOX3D(-90.6004589271402
>> 38.4732965951534,-89.9998214028184
>> 38.8267482378407)'::BOX3D,find_srid
>> ('','completechain','wkb_geometry') );
>>  QUERY PLAN
>> ----------------------------------------------------------------------
>> ----------------------------------------------------------------------
>> ----------------------------------------------------------------------
>> --------------------------
>> Seq Scan on completechain  (cost=0.00..275502.54 rows=90839
>> width=186) (actual time=10718.879..55686.238 rows=103159 loops=1)
>>   Filter: (wkb_geometry &&
>> '0103000020FF7F0000010000000500000023AA47EB6DA656C0FABE9AFB943C434023A
>> A47EB6DA656C095C6E1E2D269434089BCE812FD7F56C095C6E1E2D269434089BCE812F
>> D7F56C0FABE9AFB943C434023AA47EB6DA656C0FABE9AFB943C4340'::geometry)
>> Total runtime: 55973.870 ms
>> (3 rows)
>>
>> Can anyone tell me what I need to do to get postgres to use the
>> index? I'm sure that I'm missing something obvious.
>>
>> Thanks,
>>
>> Patrick Brannan
>
>



More information about the mapserver-users mailing list