[postgis-users] ST_CoveredBy performance question

Wed Jan 27 20:53:34 PST 2016

G'day list.

I was messing around with some queries today and found a performance
enhancement I want to understand more. The data is essentially a
simple ST_CoveredBy query involving a polygon with 8050 points and a
number of points where I'm checking to see what points are covered by
the polygon. The point data set consists of approximately 105,000
points, of which 31000 or so are covered by the polygon.

I have two queries, both of which are identical except for a call I
make to ST_AsEWKB() and a cast back to geometry in the more performant
of the two queries. The queries produce identical query plans
according to EXPLAIN, but there's a definitive performance winner.

Here are the queries along with the EXPLAIN ANALYZE output:

explain analyze with "polygon" as (select the_geom as the_geom from
atlas where id = 358437)
select
  count(*) as aggregate
from
  "listings"
  inner join "polygon" on ST_CoveredBy("listings"."the_geom",
"polygon"."the_geom");

  QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=438.99..439.00 rows=1 width=0) (actual
time=1187.632..1187.632 rows=1 loops=1)
   CTE polygon
     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.44 rows=1
width=7022) (actual time=0.013..0.015 rows=1 loops=1)
           Index Cond: (id = 358437)
   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
time=17.390..1182.191 rows=31188 loops=1)
         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
(actual time=0.015..0.017 rows=1 loops=1)
         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
width=32) (actual time=17.371..1173.810 rows=31188 loops=1)
               Recheck Cond: (the_geom @ polygon.the_geom)
               Filter: _st_coveredby(the_geom, polygon.the_geom)
               Rows Removed by Filter: 2519
               Heap Blocks: exact=5729
               ->  Bitmap Index Scan on
listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
(actual time=14.919..14.919 rows=33707 loops=1)
                     Index Cond: (the_geom @ polygon.the_geom)
 Planning time: 0.255 ms
 Execution time: 1187.693 ms
(15 rows)

Time: 1188.347 ms

Same query, except we use `ST_AsEWKB(the_geom)::geometry`...

explain analyze with "polygon" as (select
ST_AsEWKB(the_geom)::geometry as the_geom from atlas where id =
358437)
select
  count(*) as aggregate
from
  "listings"
  inner join "polygon" on ST_CoveredBy("listings"."the_geom",
"polygon"."the_geom");

 QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=439.00..439.01 rows=1 width=0) (actual
time=361.939..361.940 rows=1 loops=1)
   CTE polygon
     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.45 rows=1
width=7022) (actual time=0.170..0.174 rows=1 loops=1)
           Index Cond: (id = 358437)
   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
time=7.923..358.788 rows=31188 loops=1)
         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
(actual time=0.207..0.212 rows=1 loops=1)
         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
width=32) (actual time=7.711..353.850 rows=31188 loops=1)
               Recheck Cond: (the_geom @ polygon.the_geom)
               Filter: _st_coveredby(the_geom, polygon.the_geom)
               Rows Removed by Filter: 2519
               Heap Blocks: exact=5729
               ->  Bitmap Index Scan on
listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
(actual time=5.229..5.229 rows=33707 loops=1)
                     Index Cond: (the_geom @ polygon.the_geom)
 Planning time: 0.266 ms
 Execution time: 361.998 ms

The second query is over 3 times faster in execution time. Both
produce identical results.

I'm seeing this on Postgres 9.4.5 using both PostGIS 2.1.8 and 2.2.1.

My going theory is that this is a case where the Postgres optimizer is
seeing that ST_AsEWKB is marked as IMMUTABLE and is optimizing away
any repeated uses of it when comparing against the points. Why
wouldn't Postgres do this in the first query, when the_geom field is
not changing at any point? The the_geom field in the CTE isn't going
to change and is effective IMMUTABLE as well.

Anybody have any better insight here? Am I way off-base, or does this
sound sane?

Cheers