[postgis-users] Postgresql not using Spatial Index
dblasby at openplans.org
dblasby at openplans.org
Thu Aug 11 11:23:12 PDT 2005
>Do you have large geometries? Maybe you suffer from the TOAST
No - almost all the geometries are just a few points (this is road data
in NY city so most of the roads are 2 or 3 points).
I'm on a laptop, so my disk performance is pretty slow.
In my last message I gave the query plans with and without sequential
scans.
The actual time difference is on the order of 15* as long doing an
sequential scan and index scan. This seems pretty obvious as the
sequencial scan is reading about 20* as much data.
The estimated query times had the sequencial scan 50% faster than an
index scan.
5% doesnt seem at all marginaly to me -- 40% to 50% seems marginal.
If you have a lot of toasted tuples, then its almost always going to be
faster to do the index scan (ie. up to 95%).
The thing is, on modern hardware (even my 1.5 year old crappy laptop),
the OS and the disk do a good job of caching disk pages, so the actual
cost of doing an index scan is:
(cost of reading part of the index) + (cost of reading tuples)
The first bit is actually very cheap since its highly likely that part
of the index will already be cached.
If your table is highly spatially autocorrelated (ie. tuples that are
spatially nearby are also close together on the disk), then the 2nd one
is going to be close to the cost of doing a sequencial scan. This is
because of the way the index is constructed and queried.
So, for my query that reads 5% of the tuples, the actual cost is:
(read 5% of the index) + (read 5% of the relations pages)
For a sequencial scan, it has to read 100% of the relation pages plus
query the toast table for each toasted geometry (and then read those
disk pages). It also probably just blew apart your disk cache.
The actual timmings (see above) relect this.
So, to fix this:
1. dont use cost_random_page -- its way too large on modern hardware
2. there's a correlation coefficient in the indexing code - try setting
it to 0.99 or higher (ie. 99% autocorrelation)
3. if your stats sample has geometries > page_size/4 then its almost
certainly TOASTed -- sequencial scan are much more costly (its a HUGE
amount of work to get something out of the TOAST table).
Try testing on a system where you can clean out the disk cache easily.
Doing 15* as much work just doesnt seem reasonable to me!
dave
----------------------------------------------------------
This mail sent through IMP: https://webmail.limegroup.com/
More information about the postgis-users
mailing list