<div dir="ltr"><div class="gmail_default" style="font-family:monospace,monospace">Hey,<br></div><div class="gmail_default" style="font-family:monospace,monospace">you might want to use <br><a href="http://explain.depesz.com/">http://explain.depesz.com/</a><br></div><div class="gmail_default" style="font-family:monospace,monospace">for readeable explain analyse.<br><br></div><div class="gmail_default" style="font-family:monospace,monospace">I don't know your test protocol,<br></div><div class="gmail_default" style="font-family:monospace,monospace">but timing queries is difficult due to caching, stats, etc.<br></div><div class="gmail_default" style="font-family:monospace,monospace"><br></div><div class="gmail_default" style="font-family:monospace,monospace">could you try your query with an implicit inner join ? <br><br>WITH "polygon" AS (<br></div><div class="gmail_default" style="font-family:monospace,monospace">    SELECT the_geom::geometry<br></div><div class="gmail_default" style="font-family:monospace,monospace">    FROM atlas <br></div><div class="gmail_default" style="font-family:monospace,monospace">    WHERE id = 358437<br></div><div class="gmail_default" style="font-family:monospace,monospace">)<br>SELECT count(*) AS agggregate<br>FROM "polygon" , "listings"<br></div><div class="gmail_default" style="font-family:monospace,monospace">WHERE ST_CoveredBy("listings"."the_geom", "polygon"."the_geom")   <br><br></div><div class="gmail_default" style="font-family:monospace,monospace">ON a side note, if you want this to go faster, you can break your big polygon into smaller ones (gridding,etc)<br></div><div class="gmail_default" style="font-family:monospace,monospace">Cheers,<br></div><div class="gmail_default" style="font-family:monospace,monospace">RémiC<br></div><div class="gmail_default" style="font-family:monospace,monospace">

<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2016-01-29 17:05 GMT+01:00 J Smith <span dir="ltr"><<a href="mailto:dark.panda+lists@gmail.com" target="_blank">dark.panda+lists@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">G'day list.<br>

<br>

I posted this to postgis-users a few days ago but haven't seen any<br>

response yet, figured I'd try my luck here. Sorry if this is slightly<br>

off-topic to development, but as it involves performance, perhaps it's<br>

tangentially related.<br>

<br>

...<br>

<br>

I was messing around with some queries today and found a performance<br>

enhancement I want to understand more. The data is essentially a<br>

simple ST_CoveredBy query involving a polygon with 8050 points and a<br>

number of points where I'm checking to see what points are covered by<br>

the polygon. The point data set consists of approximately 105,000<br>

points, of which 31000 or so are covered by the polygon.<br>

<br>

I have two queries, both of which are identical except for a call I<br>

make to ST_AsEWKB() and a cast back to geometry in the more performant<br>

of the two queries. The queries produce identical query plans<br>

according to EXPLAIN, but there's a definitive performance winner.<br>

<br>

Here are the queries along with the EXPLAIN ANALYZE output:<br>

<br>

explain analyze with "polygon" as (select the_geom as the_geom from<br>

atlas where id = 358437)<br>

select<br>

  count(*) as aggregate<br>

from<br>

  "listings"<br>

  inner join "polygon" on ST_CoveredBy("listings"."the_geom",<br>

"polygon"."the_geom");<br>

<br>

<br>

  QUERY PLAN<br>

------------------------------------------------------------------------------------------------------------------------------------------------------------<br>

 Aggregate  (cost=438.99..439.00 rows=1 width=0) (actual<br>

time=1187.632..1187.632 rows=1 loops=1)<br>

   CTE polygon<br>

     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.44 rows=1<br>

width=7022) (actual time=0.013..0.015 rows=1 loops=1)<br>

           Index Cond: (id = 358437)<br>

   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual<br>

time=17.390..1182.191 rows=31188 loops=1)<br>

         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)<br>

(actual time=0.015..0.017 rows=1 loops=1)<br>

         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35<br>

width=32) (actual time=17.371..1173.810 rows=31188 loops=1)<br>

               Recheck Cond: (the_geom @ polygon.the_geom)<br>

               Filter: _st_coveredby(the_geom, polygon.the_geom)<br>

               Rows Removed by Filter: 2519<br>

               Heap Blocks: exact=5729<br>

               ->  Bitmap Index Scan on<br>

listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)<br>

(actual time=14.919..14.919 rows=33707 loops=1)<br>

                     Index Cond: (the_geom @ polygon.the_geom)<br>

 Planning time: 0.255 ms<br>

 Execution time: 1187.693 ms<br>

(15 rows)<br>

<br>

Time: 1188.347 ms<br>

<br>

<br>

<br>

Same query, except we use `ST_AsEWKB(the_geom)::geometry`...<br>

<br>

explain analyze with "polygon" as (select<br>

ST_AsEWKB(the_geom)::geometry as the_geom from atlas where id =<br>

358437)<br>

select<br>

  count(*) as aggregate<br>

from<br>

  "listings"<br>

  inner join "polygon" on ST_CoveredBy("listings"."the_geom",<br>

"polygon"."the_geom");<br>

<br>

<br>

 QUERY PLAN<br>

----------------------------------------------------------------------------------------------------------------------------------------------------------<br>

 Aggregate  (cost=439.00..439.01 rows=1 width=0) (actual<br>

time=361.939..361.940 rows=1 loops=1)<br>

   CTE polygon<br>

     ->  Index Scan using atlas_pkey on atlas  (cost=0.42..8.45 rows=1<br>

width=7022) (actual time=0.170..0.174 rows=1 loops=1)<br>

           Index Cond: (id = 358437)<br>

   ->  Nested Loop  (cost=5.08..430.46 rows=35 width=0) (actual<br>

time=7.923..358.788 rows=31188 loops=1)<br>

         ->  CTE Scan on polygon  (cost=0.00..0.02 rows=1 width=32)<br>

(actual time=0.207..0.212 rows=1 loops=1)<br>

         ->  Bitmap Heap Scan on listings  (cost=5.08..430.09 rows=35<br>

width=32) (actual time=7.711..353.850 rows=31188 loops=1)<br>

               Recheck Cond: (the_geom @ polygon.the_geom)<br>

               Filter: _st_coveredby(the_geom, polygon.the_geom)<br>

               Rows Removed by Filter: 2519<br>

               Heap Blocks: exact=5729<br>

               ->  Bitmap Index Scan on<br>

listings_the_geom_spatial_index  (cost=0.00..5.08 rows=106 width=0)<br>

(actual time=5.229..5.229 rows=33707 loops=1)<br>

                     Index Cond: (the_geom @ polygon.the_geom)<br>

 Planning time: 0.266 ms<br>

 Execution time: 361.998 ms<br>

<br>

<br>

The second query is over 3 times faster in execution time. Both<br>

produce identical results.<br>

<br>

I'm seeing this on Postgres 9.4.5 using both PostGIS 2.1.8 and 2.2.1.<br>

<br>

My going theory is that this is a case where the Postgres optimizer is<br>

seeing that ST_AsEWKB is marked as IMMUTABLE and is optimizing away<br>

any repeated uses of it when comparing against the points. Why<br>

wouldn't Postgres do this in the first query, when the_geom field is<br>

not changing at any point? The the_geom field in the CTE isn't going<br>

to change and is effective IMMUTABLE as well.<br>

<br>

Anybody have any better insight here? Am I way off-base, or does this<br>

sound sane?<br>

<br>

Cheers<br>

_______________________________________________<br>

postgis-devel mailing list<br>

<a href="mailto:postgis-devel@lists.osgeo.org">postgis-devel@lists.osgeo.org</a><br>

<a href="http://lists.osgeo.org/mailman/listinfo/postgis-devel" rel="noreferrer" target="_blank">http://lists.osgeo.org/mailman/listinfo/postgis-devel</a></blockquote></div><br></div>