[postgis-users] Much Faster Point in Polygon Count using KEYWORD LATERAL

Mark Wynter mark at dimensionaledge.com
Thu Mar 12 20:32:23 PDT 2015


Thanks Regina for your reply - some of these nuances are rather new to me, so I’ll work through each of your points, sharing with you the query planner results.  It seems the GroupAggregate cost is the root cause of the performance difference.

> It would be interesting to see what happens if you experiment with disabling
> certain strategies.  I think a non-LATERAL has a lot more options as to the
> approach it takes e.g. nested vs. hash, group agg etc where as the LATERAL I
> think goes by a nested loop strategy.  So it might be partly the plan it
> takes as well and you may achieve similar performance benefits by say
> disabling group_agg.

How do you explicitly disable groupagg?   I couldn’t find any reference - is it a case of SET enable_hashagg=ON ?  And setting ample working memory?

SET enable_hashagg=ON
SET work_mem = '1000MB';

EXPLAIN ANALYZE
SELECT a.the_geom, count(p.pid)
FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM
abs_aus11_tiles_32k WHERE tid IN (17865)) a,
tutorials.abs_mb11_points p
WHERE ST_Intersects(a.the_geom, p.wkb_geometry)
AND a.the_geom && p.wkb_geometry
GROUP BY 1;

"GroupAggregate  (cost=6007.56..6063.19 rows=200 width=36) (actual time=10221.081..11452.684 rows=3 loops=1)"
"  ->  Sort  (cost=6007.56..6025.44 rows=7151 width=36) (actual time=8703.428..9925.755 rows=1245872 loops=1)"
"        Sort Key: (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry))"
"        Sort Method: quicksort  Memory: 380087kB"
"        ->  Nested Loop  (cost=0.42..5549.76 rows=7151 width=36) (actual time=3.868..6304.275 rows=1245872 loops=1)"
"              ->  Seq Scan on abs_aus11_tiles_32k  (cost=0.00..850.26 rows=1000 width=120) (actual time=3.819..4.736 rows=4 loops=1)"
"                    Filter: (tid = 17865)"
"                    Rows Removed by Filter: 24320"
"              ->  Index Scan using idx_abs_mb11_points_geom on abs_mb11_points p  (cost=0.42..4.68 rows=1 width=52) (actual time=0.042..923.744 rows=311468 loops=4)"
"                    Index Cond: (((de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)) && wkb_geometry) AND ((de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)) && wkb_geometry))"
"                    Filter: _st_intersects((de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)), wkb_geometry)"
"                    Rows Removed by Filter: 39"
"Total runtime: 11539.734 ms"


> 3) Your && is redundant since ST_Intersects has a built-in &&.  Can you
> verify that is the case.  On rare occasions the _ST_Intersects call gets
> used before the && which shouldn't happen these days since we upped the cost
> of that function.

You’re right!

EXPLAIN ANALYZE
SELECT a.the_geom, count(p.pid)
FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM
abs_aus11_tiles_32k WHERE tid IN (17865)) a,
tutorials.abs_mb11_points p
WHERE ST_Intersects(a.the_geom, p.wkb_geometry)
--AND a.the_geom && p.wkb_geometry
GROUP BY 1;

"GroupAggregate  (cost=1961858.76..2015494.05 rows=200 width=36) (actual time=9987.600..11179.192 rows=3 loops=1)"
"  ->  Sort  (cost=1961858.76..1979736.52 rows=7151105 width=36) (actual time=8442.416..9660.582 rows=1245872 loops=1)"
"        Sort Key: (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry))"
"        Sort Method: quicksort  Memory: 380087kB"
"        ->  Nested Loop  (cost=45.18..1147714.94 rows=7151105 width=36) (actual time=53.340..6061.137 rows=1245872 loops=1)"
"              ->  Seq Scan on abs_aus11_tiles_32k  (cost=0.00..850.26 rows=1000 width=120) (actual time=3.259..3.917 rows=4 loops=1)"
"                    Filter: (tid = 17865)"
"                    Rows Removed by Filter: 24320"
"              ->  Bitmap Heap Scan on abs_mb11_points p  (cost=45.18..1139.70 rows=715 width=52) (actual time=32.949..864.123 rows=311468 loops=4)"
"                    Recheck Cond: ((de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)) && wkb_geometry)"
"                    Filter: _st_intersects((de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)), wkb_geometry)"
"                    Rows Removed by Filter: 39"
"                    ->  Bitmap Index Scan on idx_abs_mb11_points_geom  (cost=0.00..45.00 rows=2145 width=0) (actual time=32.189..32.189 rows=311507 loops=4)"
"                          Index Cond: ((de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)) && wkb_geometry)"
"Total runtime: 11265.073 ms"


> 
> 2) GIST indexes work just dandy in LEFT JOINs, so I wouldn't say they are
> ineffectual.  The planner selectivity is a little more off is all, so you
> are more likely to run into suboptimal plans with a LEFT than an INNER.

Poorly worded on my part.  I did read somewhere that GIST indexes didn’t have any effect on LEFT JOINs - possibly the world has moved on, but I still see this message:
NOTICE:  gserialized_gist_joinsel: jointype 1 not supported
This message doesn’t appear on INNER JOIN - only LEFT JOIN
Query times are identical which suggest the NOTICE is superfluous.

EXPLAIN ANALYZE
SELECT a.the_geom, l.pcount
FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17865)) a 
LEFT JOIN LATERAL
(SELECT count(mylat.pid) as pcount, a.the_geom FROM tutorials.abs_mb11_points mylat WHERE ST_Intersects(mylat.wkb_geometry, a.the_geom)) l
ON true;


"Nested Loop Left Join  (cost=4695.34..4696229.19 rows=1000 width=40) (actual time=1674.913..4609.229 rows=4 loops=1)"
"  ->  Seq Scan on abs_aus11_tiles_32k  (cost=0.00..850.26 rows=1000 width=120) (actual time=2.740..3.542 rows=4 loops=1)"
"        Filter: (tid = 17865)"
"        Rows Removed by Filter: 24320"
"  ->  Aggregate  (cost=4695.34..4695.35 rows=1 width=4) (actual time=1151.411..1151.412 rows=1 loops=4)"
"        ->  Bitmap Heap Scan on abs_mb11_points mylat  (cost=46.68..4693.55 rows=715 width=4) (actual time=32.080..805.541 rows=311468 loops=4)"
"              Recheck Cond: (wkb_geometry && (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)))"
"              Filter: _st_intersects(wkb_geometry, (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)))"
"              Rows Removed by Filter: 39"
"              ->  Bitmap Index Scan on idx_abs_mb11_points_geom  (cost=0.00..46.50 rows=2145 width=0) (actual time=31.380..31.380 rows=311507 loops=4)"
"                    Index Cond: (wkb_geometry && (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)))"
"Total runtime: 4609.291 ms"


> 4) The LEFT JOIN behavior you are getting with doing count in your LATERAL
> query is a side-affect of aggregation and not really the way you'd approach
> a LEFT JON in LATERAL general case.

I see the nuance.  INNER JOIN it should be   :-)

Unless there’s another way of avoiding the performance cost of GROUP BY, then the query times of  INNER JOIN LATERAL seem really compelling...

EXPLAIN ANALYZE
SELECT a.the_geom, l.pcount
FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17865)) a 
INNER JOIN LATERAL
(SELECT count(mylat.pid) as pcount, a.the_geom FROM tutorials.abs_mb11_points mylat WHERE ST_Intersects(mylat.wkb_geometry, a.the_geom)) l
ON true;

"Nested Loop  (cost=4695.34..4696229.19 rows=1000 width=40) (actual time=1999.832..4613.112 rows=4 loops=1)"
"  ->  Seq Scan on abs_aus11_tiles_32k  (cost=0.00..850.26 rows=1000 width=120) (actual time=2.819..3.468 rows=4 loops=1)"
"        Filter: (tid = 17865)"
"        Rows Removed by Filter: 24320"
"  ->  Aggregate  (cost=4695.34..4695.35 rows=1 width=4) (actual time=1152.403..1152.404 rows=1 loops=4)"
"        ->  Bitmap Heap Scan on abs_mb11_points mylat  (cost=46.68..4693.55 rows=715 width=4) (actual time=31.539..802.347 rows=311468 loops=4)"
"              Recheck Cond: (wkb_geometry && (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)))"
"              Filter: _st_intersects(wkb_geometry, (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)))"
"              Rows Removed by Filter: 39"
"              ->  Bitmap Index Scan on idx_abs_mb11_points_geom  (cost=0.00..46.50 rows=2145 width=0) (actual time=30.843..30.843 rows=311507 loops=4)"
"                    Index Cond: (wkb_geometry && (de_makeregularquadcells(abs_aus11_tiles_32k.wkb_geometry)))"
"Total runtime: 4613.180 ms"




More information about the postgis-users mailing list