[postgis-users] Is Index fetish bad?

Fri Mar 20 16:51:34 PDT 2009

Ben --

Hopefully the windage below is not too inaccurate and perhaps sheds some light. You might ask it of the performance or general mail list of PostgreSQL itself.

I'm not a fan of fetishes, so my quick&dirty answer is: Yes.

Unpacked a bit:

There is an overhead to indexes, especially when they are long -- text strings for example, or decimals w/ lots of digits.

When PostgreSQL does any form of lookup -- basically any time there is a WHERE clause -- it will try to use an index if there is one and if the planner thinks that the number of rows likely to be returned is small. If there are lots of tuples to be processed, a sequential scan will be used as it is faster than the potential two reads for each indexed item (one to the index and one to get the data).

Remember that FKs do not need to have indexes but it almost always helps to have such an index, since a delete of a single row from the parent table would need to scan all rows of the other to make sure that the delete was allowed. With an index that operation is likely to be much faster.

The GIST indexes that postGIS uses do take a while to build, especially on large tables, but they can work to pare spatial queries down to managable numbers of rows; without them even trivial operations would take forever.

You can monitor index usage via the system tables; indexes which aren't used are excellent candidates for elimination.

I don't have much experience with them, but I have heard that some data warehousing schemes don't have much in the way of indexing as most operations are going to wade through massive numbers of rows anyway. OLTP application, OTH, almost always benefit from indexing.

As a practical matter, it can be faster to drop indexes, do massive data loads/deletes, and then rebuild the indexes -- has impacts of user access but it can lead to indexes that are more efficient if the data is presorted, and they'll tend to be more clustered on the disk so more of them are likely to be found in RAM. I've seen occasional slow downs from doing lots of single row transactions which result in the periodic need to rebalance a B-tree index in Informix, for example.

Index creation does require extra disk/RAM space and CPU time so they shouldn't just be thrown on will ye-nill ye. Think about the application and its data -- 1,000,000 rows with 20 distinct values is a terrible candidate for indexing; a million rows with one-hundred-thousand distinct values would probably be a win, if the values are used in queries. Ceteras Parabus.

HTH,

Greg Williamson

----- Original Message ----
From: Ben Brehmer <benbrehmer at gmail.com>
To: postgis-users at postgis.refractions.net
Sent: Friday, March 20, 2009 2:51:19 PM
Subject: [postgis-users] Is Index fetish bad?

Hi All,

In what kind of scenarios should one not create an index on a table?

As I understand it, indexes can only be beneficial for SELECT statements. Unless of course the query planner inadvertently creates an inefficient query plan. I'm wondering how are INSERTs, DELETEs, and UPDATEs affected by the overhead incurred by index updates. Maybe the updating of the index is negligent in comparison to the actual INSERT, UPDATE or DELETE?

Is there any performance difference if an index is being updated for a thousand row table versus a 20 million row table?

Just wondering,

Ben
_______________________________________________
postgis-users mailing list
postgis-users at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-users