[postgis-users] Increase query performance

Tue Mar 8 13:56:16 PST 2011

Andreas just a thought. 
I would be a little suspicious about the grid. Could it be that your
grid doesn't consist of just 4 corner cells. As I understood from your
other post it was created in python.
I have not experienced it in postgis, but in other software that when
you have played enough with something you think is a very simple geomety
it happens to have a lot of vertex points because of transformations.

Can you have done anything causing the grid to be more complex than
expected? you can check that out with st_npoints.

Maybe you have mentioned, but do you get a proper index-scan.

Have you tried to just do the intersects test without intersection?
If that is very much faster it might be an idea to handle grid cells
contained in polygons separately since they will be intact grid cells. I
don't know if there is such a shortcut in st_intersection. If not, maybe
something to take a look at.

Maybe something like below. Only do the ST_Intersection calculation when
the grid cell intersects the boundary of the cshapeswdate.geom otherwise
if the grid is within cshapeswdate.geom just take the whole grid cell.

Maybe faster, I don't know.

SELECT ST_Intersection(priogrid_land.cell, cshapeswdate.geom) AS geom,
priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate,
enddate, capname, caplong, caplat, col, row, xcoord, ycoord INTO
cshapesgrid1946 FROM priogrid_land, cshapeswdate WHERE
ST_Intersects(priogrid_land.cell, st_boundary(cshapeswdate.geom)) AND
cshapeswdate.gwsyear <=1946 AND cshapeswdate.gweyear >=1946 AND
cshapeswdate.startdate <= '1946/1/1'

union all

SELECT priogrid_land.cell AS geom, priogrid_land.gid AS divider, gwcode,
gwsyear, gweyear, startdate, enddate, capname, caplong, caplat, col,
row, xcoord, ycoord INTO cshapesgrid1946 FROM priogrid_land,
cshapeswdate WHERE ST_within(priogrid_land.cell, cshapeswdate.geom) AND
cshapeswdate.gwsyear <=1946 AND cshapeswdate.gweyear >=1946 AND
cshapeswdate.startdate <= '1946/1/1'
;

/Nicklas

On Tue, 2011-03-08 at 16:02 -0500, Paragon Corporation wrote:
> Andreas,
>  
> Try ST_SimplifyPreserveTolerance.  The ST_Simplify often simplifies to
> nothing or close.
>  
> Like I mentioned in the other post, it could be some huge geometries
> causing your problems.  You don't necessarily want to simplify
> everything.
>  
> We usually do a conditional simplify like
>  
> CASE WHEN ST_NPoints(geom) > 3000 THEN
> ST_SimplyfyPreserveTopology(geom,0.1) ELSE geom END
>  
>  
> It probably wouldn't hurt to do a max check on your tables or a count
> to see how many have more than n number of points.
>  
> SELECT MAX(ST_NPoints(geom)) As biggest, COUNT(CASE WHEN
> ST_NPoints(geom) > 3000 THEN 1 ELSE NULL END) as cnt_big_geoms
> FROM yourtable
>  
> To get a sense of the largest geometry you are dealing with.
>  
> Hope that helps,
> Regina
> http://www.postgis.us
> 
> 
> ______________________________________________________________________
> From: Andreas Forø Tollefsen [mailto:andreasft at gmail.com] 
> Sent: Tuesday, March 08, 2011 12:02 PM
> To: PostGIS Users Discussion
> Cc: Paragon Corporation
> Subject: Re: [postgis-users] Increase query performance
> 
> 
> 
> After a suggestion from pgsql_performance i tried with ST_Simplify to
> speed things up. 
> However this gives me a:
> NOTICE: ptarray_simplify returned a <2 pts array
> 
> 
> Then server connection terminates.
> 
> 
> Like this:  
> 
> 
> SELECT ST_Intersection(priogrid_land.cell,
> ST_Simplify(cshapeswdate.geom,0.1)) AS geom, 
> priogrid_land.gid AS divider, gwcode, gwsyear, gweyear, startdate,
> enddate, capname, caplong, caplat, col, row, xcoord, ycoord 
> FROM priogrid_land, cshapeswdate WHERE
> ST_Intersects(priogrid_land.cell, ST_Simplify(cshapeswdate.geom,0.1))
> AND cshapeswdate.gwsyear <=1946 AND cshapeswdate.gweyear >=1946 AND
> cshapeswdate.startdate <= '1946/1/1';
> 
> 
> 
> 2011/3/8 Andreas Forø Tollefsen <andreasft at gmail.com>
>         Results from the EXPLAIN ANALYZE: 
>         
>         
>         "Nested Loop  (cost=0.00..1189.72 rows=3941 width=87790)
>         (actual time=7.091..2524830.264 rows=54145 loops=1)"
>         "  Join Filter: _st_intersects(priogrid_land.cell,
>         cshapeswdate.geom)"
>         "  ->  Seq Scan on cshapeswdate  (cost=0.00..16.23 rows=22
>         width=87304) (actual time=0.011..0.542 rows=72 loops=1)"
>         "        Filter: ((gwsyear <= 1946::numeric) AND (gweyear >=
>         1946::numeric) AND (startdate <= '1946-01-01'::date))"
>         "  ->  Index Scan using idx_priogrid_land_cell on
>         priogrid_land  (cost=0.00..8.29 rows=1 width=486) (actual
>         time=3.026..30.152 rows=1338 loops=72)"
>         "        Index Cond: (priogrid_land.cell &&
>         cshapeswdate.geom)"
>         "Total runtime: 2524889.630 ms"
>         
>         
>         
>         2011/3/8 Andreas Forø Tollefsen <andreasft at gmail.com> 
>         
>         
>                 Hi, 
>                 This query takes about 41 minutes per year. Doing this
>                 for every year from 1946 to 2008 takes a lot of time.
>                 The grid consists of 64818 cell polygons, while the
>                 country table has about 210 polygons.
>                 
>                 
>                 I will add indexes on year and date and try again. An
>                 do an explain analyze.
>                 
>                 
>                 Cheers,
>                 Andreas
>                 
>                 2011/3/8 Paragon Corporation <lr at pcorp.us>
>                         
>                         Andreas,
>                         I don't see anything glaringly wrong with your
>                         query, but hard to tell without seeing an
>                         explain plan or what you mean by takes a lot
>                         of time.  Is a lot of time minutes, hours,
>                         days?
>                         How many records are we talking about here?
>                         What's the max number of points you have in
>                         any geometry.  Often times its just one huge
>                         mega geometry with a 100,000 points or more
>                         slowing everything down.
>                          
>                         Do you have indexes on your year fields and
>                         date columns?
>                          
>                         Hope that helps,
>                         Regina 
>                         http://www.postgis.us
>                          
>                         
>                         
>                         ______________________________________________
>                         From:
>                         postgis-users-bounces at postgis.refractions.net
>                         [mailto:postgis-users-bounces at postgis.refractions.net] On Behalf Of Andreas Forø Tollefsen
>                         Sent: Tuesday, March 08, 2011 4:15 AM
>                         To: PostGIS Users Discussion
>                         Subject: [postgis-users] Increase query
>                         performance
>                         
>                         
>                         
>                         
>                         Hi all, 
>                         
>                         
>                         Another question on postgis query performance.
>                         I did some discussion on the pgsql performance
>                         list on how i could increase the performance
>                         on my server.
>                         I did manage to increase the transactions per
>                         second, but came to the conclusion that this
>                         did not help the ST_Intersection query which i
>                         was trying to speed up.
>                         
>                         
>                         Any suggestions on how to speed up this query?
>                         Basically i want to create an intersection
>                         between a vector grid and country shapefiles.
>                         Then calculate the area of all the polygons in
>                         the intersected table, and finally selecting
>                         the country code for each cell which
>                         represents the largest area within each cell.
>                         I does work as it is, but i would like to
>                         increase the speed. The query which takes a
>                         lot of time is the ST_Intersection.
>                         
>                         
>                         "PostgreSQL 8.4.7 on i686-pc-linux-gnu,
>                         compiled by GCC gcc-4.4.real (Ubuntu/Linaro
>                         4.4.4-14ubuntu5) 4.4.5, 32-bit"
>                         
>                         
>                         "POSTGIS="2.0.0SVN" GEOS="3.2.2-CAPI-1.6.2"
>                         PROJ="Rel. 4.7.1, 23 September 2009"
>                         LIBXML="2.7.7" USE_STATS"
>                         
>                         
>                         
>                         
>                         DROP TABLE IF EXISTS cshapesgrid1946;
>                         
>                         
>                         SELECT ST_Intersection(priogrid_land.cell,
>                         cshapeswdate.geom) AS geom, priogrid_land.gid
>                         AS divider, gwcode, gwsyear, gweyear,
>                         startdate, enddate, capname, caplong, caplat,
>                         col, row, xcoord, ycoord INTO cshapesgrid1946
>                         FROM priogrid_land, cshapeswdate WHERE
>                         ST_Intersects(priogrid_land.cell,
>                         cshapeswdate.geom) AND cshapeswdate.gwsyear
>                         <=1946 AND cshapeswdate.gweyear >=1946 AND
>                         cshapeswdate.startdate <= '1946/1/1';
>                         
>                         
>                         ALTER TABLE cshapesgrid1946 ADD COLUMN area
>                         float;
>                         
>                         
>                         UPDATE cshapesgrid1946 SET area =
>                         ST_Area(cshapesgrid1946.geom);
>                         
>                         
>                         DROP TABLE IF EXISTS pg1946;
>                         
>                         
>                         SELECT * INTO pg1946 FROM cshapesgrid1946 a
>                         WHERE a.area IN (SELECT MAX(area) FROM
>                         cshapesgrid1946 b GROUP BY divider);
>                         
>                         
>                         CREATE INDEX idx_pg1946 ON pg1946 USING
>                         gist(geom);
>                         
>                         
>                         
>                         
>                         Best,
>                         Andreas
>                         
>                         
>                         _______________________________________________
>                         postgis-users mailing list
>                         postgis-users at postgis.refractions.net
>                         http://postgis.refractions.net/mailman/listinfo/postgis-users
>                         
>                 
>                 
>         
>         
> 
> 
> _______________________________________________
> postgis-users mailing list
> postgis-users at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-users