[postgis-users] Help with Bad Query Plan

Paul Ramsey pramsey at opengeo.org
Thu Jan 8 11:41:13 PST 2009


Is there a Step 6, where your run your final query, but you haven't  
created an index yet? Because step 5 is functionally the same as Step  
8, which you note doesn't return.

On Jan 8, 2009, at 11:30 AM, Oliver Snowden wrote:

> Dear all, I am having some problems with a bad query plan.   
> Essentially,
> I have two tables.  A table of many different types of geography and
> another table (reports) linking to those geographies.  We want to  
> select
> a small subset of reports (e.g. 30 reports) based on date and  
> intersect
> those report geographies (potentially historical geographies) with a
> (current) selected geography.
>
> I have (conceptually) worked out what I believe to be most efficient  
> by
> putting the query into canonical form, moving the restricts down and
> putting the most restrictive item first; although I haven't actually
> done any specific calculations - index/disk access speeds etc.
>
> My view is that the query should be executed as shown below.
>
> Any comments or suggestions will be greatly appreciated.  We have  
> tried
> various things inc. subqueries, using &&, and indexing almost  
> everything
> etc.  Still no luck.
>
> Many thanks, Oliver.
>
>
>                                       project
>                                 geolink.report_id
>                                         |
>                   intersect (geography.the_geom,
> selected_geography.the_geom)
>                                         |
>                 ------------------------------------------------
>                 |                                               |
>               project                                       project
> geolink.report_id, geography.the_geom
> selected_geography.the_geom
>                 |                                               |
>                join                                          restrict
>     geolink.geom_id = geography.gid
> selected_geography.gid = 3
>                 |                                               |
>    ---------------------------                    [geography as
> selected_geography]
>    |                         |
> project                   project
> report_id, geom_id      gid, the_geom
>    |                         |
>  restrict                    |
> based on date                 |
>    |                         |
> [geolink]                [geography]
>
>
> Preparation:
>
> 1. First experienced the issue on PostGIS 1.3.3-1 on Ubuntu 8.10.
> Installed PostGIS 1.3.5 (PostgreSQL 8.3) on Windows Vista and the
> details, below, refer to that Windows installation.
>
> 2. Loaded continent, region, country, admin areas, and cities from the
> ESRI Data & Maps CD (2006 data).
> Data was loaded with SPIT in Quantum GIS 0.11.0-Metis using SRID 4326.
>
>
> ##############################
> #
> # COMPLETE SCRIPT v4
> #
> ##############################
>
> -- 1.
> CREATE SEQUENCE gdors_geography_pkey_seq;
>
> CREATE TABLE gdors_geography (gid serial PRIMARY KEY,
> geotype varchar(32),
> published date,
> supplier varchar(64),
> objectid integer,
> name character varying(128),
> sqmi double precision,
> sqkm double precision,
> comment character varying(192));
> SELECT
> AddGeometryColumn('','gdors_geography','the_geom','4326','GEOMETRY', 
> 2);
> ALTER TABLE gdors_geography ALTER COLUMN gid SET DEFAULT
> NEXTVAL('gdors_geography_pkey_seq');
>
>
> -- 2.
> -- (Note: inserted the following geographies four times, each, to
> increase the number of geometries)
>
> INSERT INTO gdors_geography (geotype, published, supplier, the_geom,
> objectid, name, sqmi, sqkm)
> SELECT 'CONTINENT', '2006-01-01', 'ESRI', c.the_geom, c.objectid,
> c.continent, c.sqmi, c.sqkm
> FROM continent c;
>
> INSERT INTO gdors_geography (geotype, published, supplier, the_geom,
> objectid, name, sqmi, sqkm)
> SELECT 'REGION', '2006-01-01', 'ESRI', r.the_geom, r.objectid,  
> r.region,
> r.sqmi, r.sqkm
> FROM region r;
>
> INSERT INTO gdors_geography (geotype, published, supplier, the_geom,
> objectid, name, sqmi, sqkm)
> SELECT 'COUNTRY', '2006-01-01', 'ESRI', c.the_geom, c.objectid,
> c.cntry_name, c.sqmi, c.sqkm
> FROM cntry06 c;
>
> INSERT INTO gdors_geography (geotype, published, supplier, the_geom,
> objectid, name, sqmi, sqkm)
> SELECT 'ADMIN','2006-01-01', 'ESRI', a.the_geom, a.objectid,
> a.admin_name, a.sqmi, a.sqkm
> FROM admin a;
>
> INSERT INTO gdors_geography (geotype, published, supplier, the_geom,
> objectid, name)
> SELECT 'CITY','2006-01-01', 'ESRI', c.the_geom, c.objectid,  
> c.city_name
> FROM city c;
>
> -- 3.  Create table linking, say, reports to geography.
> CREATE TABLE gdors_geolink
> (
>  report_id integer,
>  geom_id integer,
>  lastupdated date,
>  temp_report_date date,
>  CONSTRAINT gdors_geolink_pkey PRIMARY KEY (report_id, geom_id),
>  CONSTRAINT gdors_geolink_geom_id_fkey FOREIGN KEY (geom_id)
>      REFERENCES gdors_geography (gid) MATCH SIMPLE
>      ON UPDATE NO ACTION ON DELETE NO ACTION
> );
>
>
> -- 4. Insert lots of report data to geography. For example:
> INSERT INTO gdors_geolink(report_id, geom_id, lastupdated,
> temp_report_date) VALUES (1,5061,'2006-06-01', '2005-10-01');
> INSERT INTO gdors_geolink(report_id, geom_id, lastupdated,
> temp_report_date) VALUES (2,20034,'2006-06-01', '1980-07-27');
> INSERT INTO gdors_geolink(report_id, geom_id, lastupdated,
> temp_report_date) VALUES (3,20848,'2006-06-01', '1991-07-15');
> --...10,000 report entries
>
> -- 5. Display table sizes.
>
> -- FYI: 10000 Reports
> SELECT COUNT(*) AS count
> FROM gdors_geolink;
>
> --FYI: 21620 Geographies
> SELECT COUNT(*) AS count
> FROM gdors_geography;
>
>
> -- 6. Build up query.
>
> -- STEP 1 (reports between dates)
> -- 275 rows, 15ms.
> SELECT geolink.report_id
> FROM gdors_geolink geolink
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06';
>
> -- STEP 2 (join geography)
> -- 275 rows, 16ms.
> SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid;
>
> -- STEP 3 (select geography, e.g. Asia continent)
> -- 275 rows, 26ms.
> SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3;
>
> -- STEP 4 (intersect Asia continent over all other geographies)
> -- 81 rows, ~13000ms - a large increase in query time.
> SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3
> AND ST_Intersects(selected_geography.the_geom, geography.the_geom);
>
>
> -- STEP 5 (count reports within geography)
> -- 81 rows, ~13000ms.
> SELECT COUNT(*) AS count
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3
> AND ST_Intersects(selected_geography.the_geom, geography.the_geom);
>
> -- 7. Create Index:
> CREATE INDEX gdors_geography_geom_index ON gdors_geography USING GIST
> (the_geom);
> VACUUM ANALYZE;
>
> -- 8. Execute intersect query.
> -- EXECUTES FOR A VERY LONG TIME (hours? days?) - it hasn't ever
> finished for me.
> SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3
> AND ST_Intersects(selected_geography.the_geom, geography.the_geom);
>
>
> -- 9. Compare query plan with and without the ST_Intersects statement:
>
> EXPLAIN ANALYZE SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3
>
>
> "Nested Loop  (cost=0.00..1490.77 rows=265 width=8) (actual
> time=1.063..16.169 rows=275 loops=1)"
> "  ->  Index Scan using gdors_geography_pkey on gdors_geography
> selected_geography  (cost=0.00..8.27 rows=1 width=0) (actual
> time=0.047..0.051 rows=1 loops=1)"
> "        Index Cond: (gid = 3)"
> "  ->  Nested Loop  (cost=0.00..1479.85 rows=265 width=8) (actual
> time=0.168..14.059 rows=275 loops=1)"
> "        ->  Seq Scan on gdors_geolink geolink  (cost=0.00..205.00
> rows=265 width=8) (actual time=0.112..5.842 rows=275 loops=1)"
> "              Filter: ((temp_report_date >= '2008-01-06'::date) AND
> (temp_report_date <= '2009-01-06'::date))"
> "        ->  Index Scan using gdors_geography_pkey on gdors_geography
> geography  (cost=0.00..4.80 rows=1 width=4) (actual time=0.014..0.017
> rows=1 loops=275)"
> "              Index Cond: (geography.gid = geolink.geom_id)"
> "Total runtime: 16.927 ms"
>
> (Used 'Explain query' in pgAdminIII to get result.  EXPLAIN ANALYZE  
> too
> slow - it would not finish within hours? days?)
> SELECT geolink.report_id, geography.gid
> FROM gdors_geolink geolink, gdors_geography geography, gdors_geography
> selected_geography
> WHERE geolink.temp_report_date BETWEEN '2008-01-06' AND '2009-01-06'
> AND geolink.geom_id = geography.gid
> AND selected_geography.gid=3
> AND ST_Intersects(selected_geography.the_geom, geography.the_geom);
>
>
> "Nested Loop  (cost=0.00..91.85 rows=1 width=8)"
> "  ->  Nested Loop  (cost=0.00..16.56 rows=1 width=4)"
> "        Join Filter: _st_intersects(selected_geography.the_geom,
> geography.the_geom)"
> "        ->  Index Scan using gdors_geography_pkey on gdors_geography
> selected_geography  (cost=0.00..8.27 rows=1 width=13706)"
> "              Index Cond: (gid = 3)"
> "        ->  Index Scan using gdors_geography_geom_index on
> gdors_geography geography  (cost=0.00..8.27 rows=1 width=13710)"
> "              Index Cond: (selected_geography.the_geom &&
> geography.the_geom)"
> "              Filter: (selected_geography.the_geom &&
> geography.the_geom)"
> "  ->  Index Scan using gdors_geolink_pkey on gdors_geolink geolink
> (cost=0.00..75.28 rows=1 width=8)"
> "        Index Cond: (geolink.geom_id = geography.gid)"
> "        Filter: ((geolink.temp_report_date >= '2008-01-06'::date) AND
> (geolink.temp_report_date <= '2009-01-06'::date))"
>
>
> -- 10. Created various indexes but could not seem to improve the
> performance / influence the query plan.
> -----------------------------------------
> **************************************************************************
> The information contained in the EMail and any attachments is
> confidential and intended solely and for the attention and use of
> the named addressee(s). It may not be disclosed to any other person
> without the express authority of the HPA, or the intended
> recipient, or both. If you are not the intended recipient, you must
> not disclose, copy, distribute or retain this message or any part
> of it. This footnote also confirms that this EMail has been swept
> for computer viruses, but please re-sweep any attachments before
> opening or saving. HTTP://www.HPA.org.uk
> **************************************************************************
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users


--
Paul Ramsey
OpenGeo - http://opengeo.org
The secret sauce is called... PostGIS.




More information about the postgis-users mailing list