[postgis-devel] ST_CoveredBy performance question
Rémi Cura
remi.cura at gmail.com
Fri Jan 29 10:10:32 PST 2016
Hey,
you might want to use
http://explain.depesz.com/
for readeable explain analyse.
I don't know your test protocol,
but timing queries is difficult due to caching, stats, etc.
could you try your query with an implicit inner join ?
WITH "polygon" AS (
SELECT the_geom::geometry
FROM atlas
WHERE id = 358437
)
SELECT count(*) AS agggregate
FROM "polygon" , "listings"
WHERE ST_CoveredBy("listings"."the_geom", "polygon"."the_geom")
ON a side note, if you want this to go faster, you can break your big
polygon into smaller ones (gridding,etc)
Cheers,
RémiC
2016-01-29 17:05 GMT+01:00 J Smith <dark.panda+lists at gmail.com>:
> G'day list.
>
> I posted this to postgis-users a few days ago but haven't seen any
> response yet, figured I'd try my luck here. Sorry if this is slightly
> off-topic to development, but as it involves performance, perhaps it's
> tangentially related.
>
> ...
>
> I was messing around with some queries today and found a performance
> enhancement I want to understand more. The data is essentially a
> simple ST_CoveredBy query involving a polygon with 8050 points and a
> number of points where I'm checking to see what points are covered by
> the polygon. The point data set consists of approximately 105,000
> points, of which 31000 or so are covered by the polygon.
>
> I have two queries, both of which are identical except for a call I
> make to ST_AsEWKB() and a cast back to geometry in the more performant
> of the two queries. The queries produce identical query plans
> according to EXPLAIN, but there's a definitive performance winner.
>
> Here are the queries along with the EXPLAIN ANALYZE output:
>
> explain analyze with "polygon" as (select the_geom as the_geom from
> atlas where id = 358437)
> select
> count(*) as aggregate
> from
> "listings"
> inner join "polygon" on ST_CoveredBy("listings"."the_geom",
> "polygon"."the_geom");
>
>
> QUERY PLAN
>
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=438.99..439.00 rows=1 width=0) (actual
> time=1187.632..1187.632 rows=1 loops=1)
> CTE polygon
> -> Index Scan using atlas_pkey on atlas (cost=0.42..8.44 rows=1
> width=7022) (actual time=0.013..0.015 rows=1 loops=1)
> Index Cond: (id = 358437)
> -> Nested Loop (cost=5.08..430.46 rows=35 width=0) (actual
> time=17.390..1182.191 rows=31188 loops=1)
> -> CTE Scan on polygon (cost=0.00..0.02 rows=1 width=32)
> (actual time=0.015..0.017 rows=1 loops=1)
> -> Bitmap Heap Scan on listings (cost=5.08..430.09 rows=35
> width=32) (actual time=17.371..1173.810 rows=31188 loops=1)
> Recheck Cond: (the_geom @ polygon.the_geom)
> Filter: _st_coveredby(the_geom, polygon.the_geom)
> Rows Removed by Filter: 2519
> Heap Blocks: exact=5729
> -> Bitmap Index Scan on
> listings_the_geom_spatial_index (cost=0.00..5.08 rows=106 width=0)
> (actual time=14.919..14.919 rows=33707 loops=1)
> Index Cond: (the_geom @ polygon.the_geom)
> Planning time: 0.255 ms
> Execution time: 1187.693 ms
> (15 rows)
>
> Time: 1188.347 ms
>
>
>
> Same query, except we use `ST_AsEWKB(the_geom)::geometry`...
>
> explain analyze with "polygon" as (select
> ST_AsEWKB(the_geom)::geometry as the_geom from atlas where id =
> 358437)
> select
> count(*) as aggregate
> from
> "listings"
> inner join "polygon" on ST_CoveredBy("listings"."the_geom",
> "polygon"."the_geom");
>
>
> QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=439.00..439.01 rows=1 width=0) (actual
> time=361.939..361.940 rows=1 loops=1)
> CTE polygon
> -> Index Scan using atlas_pkey on atlas (cost=0.42..8.45 rows=1
> width=7022) (actual time=0.170..0.174 rows=1 loops=1)
> Index Cond: (id = 358437)
> -> Nested Loop (cost=5.08..430.46 rows=35 width=0) (actual
> time=7.923..358.788 rows=31188 loops=1)
> -> CTE Scan on polygon (cost=0.00..0.02 rows=1 width=32)
> (actual time=0.207..0.212 rows=1 loops=1)
> -> Bitmap Heap Scan on listings (cost=5.08..430.09 rows=35
> width=32) (actual time=7.711..353.850 rows=31188 loops=1)
> Recheck Cond: (the_geom @ polygon.the_geom)
> Filter: _st_coveredby(the_geom, polygon.the_geom)
> Rows Removed by Filter: 2519
> Heap Blocks: exact=5729
> -> Bitmap Index Scan on
> listings_the_geom_spatial_index (cost=0.00..5.08 rows=106 width=0)
> (actual time=5.229..5.229 rows=33707 loops=1)
> Index Cond: (the_geom @ polygon.the_geom)
> Planning time: 0.266 ms
> Execution time: 361.998 ms
>
>
> The second query is over 3 times faster in execution time. Both
> produce identical results.
>
> I'm seeing this on Postgres 9.4.5 using both PostGIS 2.1.8 and 2.2.1.
>
> My going theory is that this is a case where the Postgres optimizer is
> seeing that ST_AsEWKB is marked as IMMUTABLE and is optimizing away
> any repeated uses of it when comparing against the points. Why
> wouldn't Postgres do this in the first query, when the_geom field is
> not changing at any point? The the_geom field in the CTE isn't going
> to change and is effective IMMUTABLE as well.
>
> Anybody have any better insight here? Am I way off-base, or does this
> sound sane?
>
> Cheers
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/postgis-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20160129/b2c68e4f/attachment.html>
More information about the postgis-devel
mailing list