[postgis-users] Much Faster Point in Polygon Count using KEYWORD LATERAL
Paragon Corporation
lr at pcorp.us
Wed Mar 11 18:35:47 PDT 2015
Mark,
I wasn't expecting the LATERAL change to provide such dramatic performance,
so that's good to know. So I guess since you need a GROUP BY in the
non-LATERAL case, that may be a good chunk of the reason as you mentioned
the plan suggested.
It would be interesting to see what happens if you experiment with disabling
certain strategies. I think a non-LATERAL has a lot more options as to the
approach it takes e.g. nested vs. hash, group agg etc where as the LATERAL I
think goes by a nested loop strategy. So it might be partly the plan it
takes as well and you may achieve similar performance benefits by say
disabling group_agg.
1) Regarding CROSS JOIN LATERAL. As you observed a CROSS JOIN LATERAL is
equivalent to an INNER JOIN because since the LATERAL is doing a subquery
for each row in the first table, it's already got a built-in WHERE.
2) GIST indexes work just dandy in LEFT JOINs, so I wouldn't say they are
ineffectual. The planner selectivity is a little more off is all, so you
are more likely to run into suboptimal plans with a LEFT than an INNER.
3) Your && is redundant since ST_Intersects has a built-in &&. Can you
verify that is the case. On rare occasions the _ST_Intersects call gets
used before the && which shouldn't happen these days since we upped the cost
of that function.
4) The LEFT JOIN behavior you are getting with doing count in your LATERAL
query is a side-affect of aggregation and not really the way you'd approach
a LEFT JON in LATERAL general case.
For example -- SELECT count(*) where false; gives you an answer of 0
instead of no records returned.
The general case of doing a LEFT JOIN with LATERAL would be to just set the
ON to true.
SELECT a.*, l.* FROM a LEFT JOIN LATERAL (SELECT something FROM mylat WHERE
ST_Intersects(mylat.geom, a.geom) ) As l ON true;
Hope that helps,
Regina
http://www.postgis.us
http://postgis.net
----- ORIGINAL MESSAGE --
Subject: Much Faster Point in Polygon Count using KEYWORD LATERAL
Date: 8 March 2015 12:28:32 pm ACDT
To: postgis-users at lists.osgeo.org
Hi All
To provide some context, I'm in the midst of refactoring my recursive
quadgrid function which I'll share with everyone in the next few days.
I've got a table with many millions of points and I so I've been playing
with different query patterns to find the fastest way possible of counting
points within a set of quadcells.
The way I've done this traditionally is to put a GIST index on the points
table, and then use a CROSS JOIN query of the form:
SELECT l.the_geom, count(p.pid)
FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM
abs_aus11_tiles_32k WHERE tid IN (17865)) l,
tutorials.abs_mb11_points p
WHERE ST_Intersects(l.the_geom, p.wkb_geometry) AND l.the_geom &&
p.wkb_geometry GROUP BY 1;
"0103000...";688154
"0103000...";473202
"0103000...";84516
About 4.4 seconds for sum 1.2 million points (not startling). Plus there's
a grid cell with zero points, hence this geometry doesn't get returned by
this query pattern. Bummer. When I look at the query plan, the GROUP BY
clause seems to be acting as a significant handbrake on the query
performance.
I've also looked at a LEFT JOIN instead of a CROSS JOIN to solve the problem
of the non-returning quad cell, but I understand GIST indexes are
ineffectual on LEFT JOINS.
A faster way that I've found is to incorporate the KEYWORD 'LATERAL' into
the join. This allows me to reference the left table in the right table:
SELECT l.the_geom, r.pcount
FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM
abs_aus11_tiles_32k WHERE tid IN (17865)) l,
LATERAL
(SELECT count(*) as pcount, l.the_geom FROM tutorials.abs_mb11_points WHERE
ST_Intersects(l.the_geom, wkb_geometry) AND l.the_geom && wkb_geometry) r;
"0103000."; 473202
"0103000."; 84516
"0103000."; 0
"0103000."; 688154
This cuts the query time down to 1.7 seconds, mostly be avoiding the need to
use GROUP BY, and all quadcells are returned even those with zero points in
polygons. In effect the query appears to be mimicking a LEFT JOIN.
I was expecting that the LATERAL CROSS JOIN would have produced the
cartesian product of the two tables, and that I would have to enforce a
WHERE condition ST_Equals(l.the_geom, r.the_geom);. Alas not.
Just wondering if anyone else has encountered this when using LATERAL, and
could shed some light on why this is the case?
More information about the postgis-users
mailing list