[postgis-users] RE: geometry_columns index and postgres performance

Gregory S. Williamson gsw at globexplorer.com
Thu Aug 12 03:37:16 PDT 2004


You should also do a "vacuum analyze geometry_columns;" -- we saw issue when this was not done even with only 220 columns in it. vacuuming prevented the sequential scans which are a killer on performance.

G

-----Original Message-----
From:	Gregory S. Williamson
Sent:	Wed 8/11/2004 3:16 AM
To:	postgis-users at postgis.refractions.net
Cc:	
Subject:	geometry_columns index and postgres performance

This is not so much a request for help (public thanks to Paul and Kevin from Refractions for some advice already given), as it is a caveat to other postGIS users and perhaps a suggestion that an index be added to the geometry_columns table creation script.

(Or maybe not -- this may be something we're doing that's unique.)

This was seen on Dell servers (1750s I think) with 2 CPUs and 2 gigs of ram, running PostgreSQL 7.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 (Mandrake Linux 9.1 3.2.2-3mdk), postGIS version is 0.8 USE_GEOS=1 USE_PROJ=1 USE_STATS=1. It was consistant across 3 servers, although with some variation.

We added a lot of tables to our runtime (varying amounts of data from 1 row to perhaps 600k) and made sure they had the geometry stats update. We saw tremendous degradation of response time under load, with the servers eating RAM and lots of context switches. Very ugly.

Looking at the pg_stat_user_tables and similar views I noticed that we had a lot of sequential scans on some tables; some of them are image tables with only one row so a sequential scan makes great sense, but i also saw:
   relid   |     schemaname     |         relname         | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
-----------+--------------------+-------------------------+----------+----------
----+----------+---------------+-----------+-----------+-----------
 138603026 | public             | geometry_columns        |      649 |       119344 |        0 |             0 |       216 |     72036 |         0
 138603677 | public             | mosaicable_codes        |       13 |
 26 |     3796 |          3796 |         2 |         0 |         0
...

At first I ignored this assuming it was the same deal -- postgres' optimizer being smart, but as our site continued to flounder with only the addition of tables (tables not being referenced, by the way) I tried making an index:
CREATE INDEX geocol_attrvar ON geometry_columns (attrelid, varattnum);

(ok i am slow and should have seen this earlier!)

And lo! and behold our context switching dropped from 50-100k per 5 seconds to a few hundred and site performance was restored. The result was almost instant.

--> tenative conclusion: we pushed ourselves over some edge because the geometry_columns table got large enough (220 rows with stats on data distribution) to cause serious degradation when it was repeatedly scanned. We'll verify this in a day or so when we push the new tables into production and see if we get hammered.

The snippet of a log, below, shows part of a thread working its way through a spatial request. I sincerley believe that the "SELECT stats FROM GEOMETRY_COLUMNS" query is generated inside postGIS as part of the processing of the preceding spatial query.

So if you use postGIS and you see the geometry_columns table high up on a list of sequential scans, create an index on the two columns shown.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

2004-08-11 02:54:02 [8720] LOG:  statement: begin;
2004-08-11 02:54:02 [8720] LOG:  duration: 0.097 ms
2004-08-11 02:54:02 [8720] LOG:  statement: select tracking_id, p_disk_id, p_machine, b_disk_id, b_machine, stacking_or, p_directory, b_directory, filename, proj_key, copyright, the_geom, area(intersection(the_geom, GeometryFromText('POLYGON((-118.50908964373109 34.252493371756934, -118.5067083562689 34.252493371756934, -118.5067083562689 34.250906628243065, -118.50908964373109 34.250906628243065, -118.50908964373109 34.252493371756934))', 4326))) from sid_content_prem3 where
 the_geom && GeometryFromText('POLYGON((-118.50908964373109 34.252493371756934,-118.5067083562689 34.252493371756934, -118.5067083562689 34.250906628243065, -118.50908964373109 34.250906628243065, -118.50908964373109 34.252493371756934))', 4326) order by stacking_or, 13 desc;
2004-08-11 02:54:02 [8720] LOG:  statement: SELECT stats FROM GEOMETRY_COLUMNS WHERE attrelid=47056990 AND varattnum=13
2004-08-11 02:54:02 [8720] LOG:  statement: SELECT stats FROM GEOMETRY_COLUMNS WHERE attrelid=47056990 AND varattnum=13
2004-08-11 02:54:02 [8720] LOG:  duration: 5.699 ms

[this is with the index]




More information about the postgis-users mailing list