[postgis-devel] ST_CoveredBy performance question

Fri Jan 29 10:10:32 PST 2016

Hey,
you might want to use
http://explain.depesz.com/
for readeable explain analyse.

I don't know your test protocol,
but timing queries is difficult due to caching, stats, etc.

could you try your query with an implicit inner join ?

WITH "polygon" AS (
    SELECT the_geom::geometry
    FROM atlas
    WHERE id = 358437
)
SELECT count(*) AS agggregate
FROM "polygon" , "listings"
WHERE ST_CoveredBy("listings"."the_geom", "polygon"."the_geom")

ON a side note, if you want this to go faster, you can break your big
polygon into smaller ones (gridding,etc)
Cheers,
RémiC

2016-01-29 17:05 GMT+01:00 J Smith <dark.panda+lists at gmail.com>:

> G'day list.
>
> I posted this to postgis-users a few days ago but haven't seen any
> response yet, figured I'd try my luck here. Sorry if this is slightly
> off-topic to development, but as it involves performance, perhaps it's
> tangentially related.
>
> ...
>
> I was messing around with some queries today and found a performance
> enhancement I want to understand more. The data is essentially a
> simple ST_CoveredBy query involving a polygon with 8050 points and a
> number of points where I'm checking to see what points are covered by
> the polygon. The point data set consists of approximately 105,000
> points, of which 31000 or so are covered by the polygon.
>
> I have two queries, both of which are identical except for a call I
> make to ST_AsEWKB() and a cast back to geometry in the more performant
> of the two queries. The queries produce identical query plans
> according to EXPLAIN, but there's a definitive performance winner.
>
> Here are the queries along with the EXPLAIN ANALYZE output:
>
> explain analyze with "polygon" as (select the_geom as the_geom from
> atlas where id = 358437)
> select
>   count(*) as aggregate
> from
>   "listings"
>   inner join "polygon" on ST_CoveredBy("listings"."the_geom",
> "polygon"."the_geom");
>
>
>   QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=438.99..439.00 rows=1 width=0) (actual
> time=1187.632..1187.632 rows=1 loops=1)
>    CTE polygon
>      ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.44 rows=1
> width=7022) (actual time=0.013..0.015 rows=1 loops=1)
>            Index Cond: (id = 358437)
>    ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
> time=17.390..1182.191 rows=31188 loops=1)
>          ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
> (actual time=0.015..0.017 rows=1 loops=1)
>          ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
> width=32) (actual time=17.371..1173.810 rows=31188 loops=1)
>                Recheck Cond: (the_geom @ polygon.the_geom)
>                Filter: _st_coveredby(the_geom, polygon.the_geom)
>                Rows Removed by Filter: 2519
>                Heap Blocks: exact=5729
>                ->  Bitmap Index Scan on
> listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
> (actual time=14.919..14.919 rows=33707 loops=1)
>                      Index Cond: (the_geom @ polygon.the_geom)
>  Planning time: 0.255 ms
>  Execution time: 1187.693 ms
> (15 rows)
>
> Time: 1188.347 ms
>
>
>
> Same query, except we use `ST_AsEWKB(the_geom)::geometry`...
>
> explain analyze with "polygon" as (select
> ST_AsEWKB(the_geom)::geometry as the_geom from atlas where id =
> 358437)
> select
>   count(*) as aggregate
> from
>   "listings"
>   inner join "polygon" on ST_CoveredBy("listings"."the_geom",
> "polygon"."the_geom");
>
>
>  QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=439.00..439.01 rows=1 width=0) (actual
> time=361.939..361.940 rows=1 loops=1)
>    CTE polygon
>      ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.45 rows=1
> width=7022) (actual time=0.170..0.174 rows=1 loops=1)
>            Index Cond: (id = 358437)
>    ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual
> time=7.923..358.788 rows=31188 loops=1)
>          ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)
> (actual time=0.207..0.212 rows=1 loops=1)
>          ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35
> width=32) (actual time=7.711..353.850 rows=31188 loops=1)
>                Recheck Cond: (the_geom @ polygon.the_geom)
>                Filter: _st_coveredby(the_geom, polygon.the_geom)
>                Rows Removed by Filter: 2519
>                Heap Blocks: exact=5729
>                ->  Bitmap Index Scan on
> listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)
> (actual time=5.229..5.229 rows=33707 loops=1)
>                      Index Cond: (the_geom @ polygon.the_geom)
>  Planning time: 0.266 ms
>  Execution time: 361.998 ms
>
>
> The second query is over 3 times faster in execution time. Both
> produce identical results.
>
> I'm seeing this on Postgres 9.4.5 using both PostGIS 2.1.8 and 2.2.1.
>
> My going theory is that this is a case where the Postgres optimizer is
> seeing that ST_AsEWKB is marked as IMMUTABLE and is optimizing away
> any repeated uses of it when comparing against the points. Why
> wouldn't Postgres do this in the first query, when the_geom field is
> not changing at any point? The the_geom field in the CTE isn't going
> to change and is effective IMMUTABLE as well.
>
> Anybody have any better insight here? Am I way off-base, or does this
> sound sane?
>
> Cheers
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20160129/b2c68e4f/attachment.html>