[postgis-users] Suggestions for Improving Query Performance?

Mike Treglia mtreglia at gmail.com
Tue Jul 26 18:42:44 PDT 2022


Hi All,

Sorry for the long email - just trying to offer enough context of what I'm
doing and what I've tried.

I'm effectively trying to do a moving window analysis, to calculate around
each point within a grid, the percent of specific landcovers. I actually
would tend to do this as a raster operation, but have struggled a bit with
this kind of thing on rasters in the past, already had the raster
vectorized, and have seen good performance with these types of intersection
analyses with vector data, even with the best raster approach... thus I'd
welcome suggestions on how this might be done fully as an efficient raster
process, but for now will focus on this vector based operation.

Below is the general query (query 1) I'm working with, with explain
(analyze, buffers) for a condition so it'll run (pgid <10), with some
annotations for explanation. Because the grid is derived from the raster
overlay, it does include points in the water, which aren't important to me
(but the area of water is relevant in the calculation). Thus, further down
I've included a version (Query 2) where I included an inner join of the
points on another set of boundaries, which does not add much time, and
ultimately is better in that it reduces the number of points analyzed. I
thought that this also might enable some parallelization, as that was
outside of the lateral join (which I understand does not work with parallel
processing), but it seems like it only leverages 1 logical core.

Its taking about 5-6 seconds for analysis of 10 points. I've run
st_subdivide and created spatial indexes for the layers (including trying a
BRIN index for the points); st_subdivide on the boundaries I'm interested
in helped with the second query but st_subdivide on the vectorized
landcover didn't help with the focal analysis I'm interested in (probably
because its already fairly simple). The timing points doesn't seem too bad,
except for the fact that I'd have about 100,000,000 points to do this
analysis for, so the time to run it all would be untenable. (In the end
I'll probably rasterize the output).

Any ideas on how I might improve performance? I'm still learning about how
to interpret explain, but it looks like the aggregate is the majority of
the time, so not sure how/where I can improve that.

For what its worth, I'm on Postgres 13.0, with PostGIS 3.1.1 on Windows 10.
If it seems like upgrades to Postgres or PostGIS versions would make a
meaningful difference, happy to try that.

Thanks so much for any thoughts!
Mike

-----------QUERY 1
-- create table landcover_window as
select
su.pgid,
su.geom_2263,
-- Calculate proportion of area that is tree canopy
(sum(cliparea) filter (where lc_2017_value=1)/sum(cliparea) filter (where
lc_2017_value is not null)) as pct_canopy,
--Calulate proportion of area that is grass/shrub
    (sum(cliparea) filter (where lc_2017_value=2)/sum(cliparea) filter
(where lc_2017_value is not null)) as pct_grassshrub,
--Calculate proportion of area that is either grassh/shrub or tree canopy

(sum(cliparea) filter (where lc_2017_value in (1,2))/sum(cliparea) filter
(where lc_2017_value is not null)) as _veg
-- This is the grid of points (simply derived from a raster overlay layer)
from base_rasters.o_16_landcover6in_2017_pixecentr as su
--Calculate the area of intersection of the buffer around the points with
the landcover
----Note - its important to keep all land cover classes in consideration so
that near edges,
---- when the buffer may extend beyond landcover data, the denominator in
above calculations is appropriate.
cross join lateral
(select lc_2017_value, sum(st_area(st_intersection(a.geom_2263,
st_buffer(su.geom_2263, 656)))) as cliparea
from base_rasters.landcover6in_2017_polygons as a where
st_dwithin(a.geom_2263, su.geom_2263, 656) group by a.lc_2017_value) foo
where su.pgid <=10 -- constrain the analysis for testing purposes
group by su.pgid,
su.geom_2263;

----------Explain (analyze, buffers) output
GroupAggregate  (cost=46412.87..7647152411521.97 rows=60510592 width=60)
(actual time=774.262..5952.815 rows=10 loops=1)
  Group Key: su.pgid
  Buffers: shared hit=7441
  ->  Nested Loop  (cost=46412.87..7647139250468.21 rows=484084736
width=52) (actual time=116.693..5952.652 rows=50 loops=1)
        Buffers: shared hit=7441
        ->  Index Scan using o_16_landcover6in_2017_pixecentr_pkey on
o_16_landcover6in_2017_pixecentr su  (cost=0.57..180883167.93 rows=60510592
width=36) (actual time=0.010..0.065 rows=10 loops=1)
              Index Cond: (pgid <= 10)
              Buffers: shared hit=5
        ->  GroupAggregate  (cost=46412.30..126373.72 rows=8 width=16)
(actual time=121.148..595.249 rows=5 loops=10)
              Group Key: a.lc_2017_value
              Buffers: shared hit=7436
              ->  Sort  (cost=46412.30..46416.29 rows=1595 width=1356)
(actual time=11.872..12.579 rows=3755 loops=10)
                    Sort Key: a.lc_2017_value
                    Sort Method: quicksort  Memory: 4115kB
                    Buffers: shared hit=7436
                    ->  Index Scan using
landcover6in_2017_polygons_geom_idx on landcover6in_2017_polygons a
 (cost=0.54..46327.46 rows=1595 width=1356) (actual time=0.070..9.757
rows=3755 loops=10)
                          Index Cond: (geom_2263 && st_expand(su.geom_2263,
'656'::double precision))
                          Filter: st_dwithin(geom_2263, su.geom_2263,
'656'::double precision)
                          Rows Removed by Filter: 885
                          Buffers: shared hit=7436
Planning:
  Buffers: shared hit=3
Planning Time: 0.207 ms
Execution Time: 5953.233 ms



----QUERY 2
select
su.pgid,
su.geom_2263,
(sum(cliparea) filter (where lc_2017_value=1)/sum(cliparea) filter (where
lc_2017_value is not null)) as pct_canopy,
    (sum(cliparea) filter (where lc_2017_value=2)/sum(cliparea) filter
(where lc_2017_value is not null)) as pct_grassshrub,
    (sum(cliparea) filter (where lc_2017_value in (1,2))/sum(cliparea)
filter (where lc_2017_value is not null)) as _veg
from base_rasters.o_16_landcover6in_2017_pixecentr as su
-- Joining another layer to confine the points considered to those within
land area.
inner join admin.nycdcp_nta2010_analyticalmashup_nowater_subdiv nnan on
st_intersects(su.geom_2263, nnan.geom_2263)
cross join lateral
(select lc_2017_value, sum(st_area(st_intersection(a.geom_2263,
st_buffer(su.geom_2263, 656)))) as cliparea
from base_rasters.landcover6in_2017_polygons_subdiv as a where
st_dwithin(a.geom_2263, su.geom_2263, 656) group by a.lc_2017_value) foo
where su.pgid <= 10
group by su.pgid,
su.geom_2263;
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20220726/d457e0fa/attachment.htm>


More information about the postgis-users mailing list