[postgis-users] Much Faster Point in Polygon Count using KEYWORD LATERAL

Mark Wynter mark at dimensionaledge.com
Mon Mar 9 18:07:31 PDT 2015


Hi All
Got zero response to my question below, but the following LATERAL QUERY PATTERN is achieving a 75% reduction in query times for big point in polygon counts.
This seems pretty cool - again, has anyone else come across anything like this? 
I’m keen to hear other people’s experiences.
The results seem too good to ignore.

BACKGROUND

Table has 21 million points, with a GIST index.
The  task is to count points within 4 quadcells.
In total 1.86 Million points intersect the quadcells.

TRADITIONAL QUERY PATTERN - OLD WAY

SELECT l.the_geom, count(p.pid) FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17864)) l,
tutorials.abs_mb11_points p WHERE ST_Intersects(l.the_geom, p.wkb_geometry) AND l.the_geom && p.wkb_geometry GROUP BY l.the_geom

11.7 seconds - first time (reboot, nothing in cache)
11.2 seconds - second time (post-cache)

LATERAL QUERY PATTERN - NEW WAY

SELECT l.the_geom, r.pcount FROM
(SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17864)) l,
LATERAL
(SELECT l.the_geom, count(*) as pcount FROM tutorials.abs_mb11_points WHERE ST_Intersects(l.the_geom, wkb_geometry) AND l.the_geom && wkb_geometry) r;

3.1 seconds - first time (reboot, nothing in cache)
2.4 seconds - second time (post-cache)



On 8 Mar 2015, at 12:28 pm, Mark Wynter <mark at dimensionaledge.com> wrote:

> Hi All
> 
> To provide some context, I’m in the midst of refactoring my recursive quadgrid function which I’ll share with everyone in the next few days.
> 
> I’ve got a table with many millions of points and I so I’ve been playing with different query patterns to find the fastest way possible of counting points within a set of quadcells.  
> 
> The way I’ve done this traditionally is to put a GIST index on the points table, and then use a CROSS JOIN query of the form:
> 
> SELECT l.the_geom, count(p.pid)
> FROM
> (SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17865)) l,
> tutorials.abs_mb11_points p
> WHERE ST_Intersects(l.the_geom, p.wkb_geometry) AND l.the_geom && p.wkb_geometry GROUP BY 1;
> 
> “0103000...";688154
> “0103000...";473202
> “0103000...";84516
> 
> About 4.4 seconds for sum 1.2 million points (not startling).  Plus there’s a grid cell with zero points, hence this geometry doesn’t get returned by this query pattern.  Bummer.   When I look at the query plan, the GROUP BY clause seems to be acting as a significant handbrake on the query performance.
> 
> I’ve also looked at a LEFT JOIN instead of a CROSS JOIN to solve the problem of the non-returning quad cell, but I understand GIST indexes are ineffectual on LEFT JOINS.
> 
> A faster way that I’ve found is to incorporate the KEYWORD ‘LATERAL’ into the join.  This allows me to reference the left table in the right table:
> 
> SELECT l.the_geom, r.pcount
> FROM
> (SELECT DE_MakeRegularQuadCells(wkb_geometry) as the_geom FROM abs_aus11_tiles_32k WHERE tid IN (17865)) l,
> LATERAL
> (SELECT count(*) as pcount, l.the_geom FROM tutorials.abs_mb11_points WHERE ST_Intersects(l.the_geom, wkb_geometry) AND l.the_geom && wkb_geometry) r;
> 
> "0103000…"; 473202
> "0103000…"; 84516
> "0103000…”; 0
> "0103000…"; 688154
> 
> This cuts the query time down to 1.7 seconds, mostly be avoiding the need to use GROUP BY, and all quadcells are returned even those with zero points in polygons.  In effect the query appears to be mimicking a LEFT JOIN.
> 
> I was expecting that the LATERAL CROSS JOIN would have produced the cartesian product of the two tables, and that I would have to enforce a WHERE condition ST_Equals(l.the_geom, r.the_geom);.  Alas not.
> 
> Just wondering if anyone else has encountered this when using LATERAL, and could shed some light on why this is the case?
> 



More information about the postgis-users mailing list