[postgis-users] Performance boost with ST_ValueCount() - wondering why

Tue Jan 14 16:28:07 PST 2020

> As a test, you could try to disable JIT compilation in the PostgreSQL 12
> DB server, and try again.

Hi Giuseppe,

I ended up posting the issue to the Postgres Performance list yesterday
(pgsql-performance) and turning off JIT was suggested. I did and yet the
results were essentially the same (query plan, buffers and response time)
as when JIT was enabled.

Thanks again for thinking about this question.

Shira

On Tue, Jan 14, 2020 at 9:18 AM Giuseppe Broccolo <g.broccolo.7 at gmail.com>
wrote:

> Hi Shira,
>
> Thanks for the plans! Didn't know there was a way to "reverse engineering"
> plans from the Depsez links :D
>
> Looks you have enabled JIT compilation in the PostgreSQL 12 server (
> https://www.postgresql.org/docs/11/jit-reason.html).
> That could explain the speedup in query execution!
>
> JIT compilation improve how tuple are "deformed" (i.e. the process of
> transforming on-disk tuples into in-memory representation),
> so shared buffers population is optimised. During its first implementation
> in PostgreSQL there were tests showing how aggregations
> could benefit from JIT compilation in order to be executed with a 40% less
> of time (
> https://www.postgresql.org/message-id/5a18282d-89d0-ba21-4d54-bc2259ad7f26%40postgrespro.ru
> ).
>
> I'm expecting that in PostgreSQL 12 things are even better!
>
> As a test, you could try to disable JIT compilation in the PostgreSQL 12
> DB server, and try again.
>
> Hope this insight can help,
> Giuseppe.
>
> Il giorno lun 13 gen 2020 alle ore 16:21 Shira Bezalel <shira at sfei.org>
> ha scritto:
>
>> Hi Giuseppe,
>>
>> Here is the original explain output (You can also get this from clicking
>> on the "Source" tab in the Depsez links):
>>
>> 9.6
>>
>> HashAggregate  (cost=42103.26..42105.26 rows=200 width=16) (actual
>> time=94062.150..94062.161 rows=59 loops=1)
>>   Group Key: (st_valuecount(cv.rast, 1, true, NULL::double precision[],
>> '0'::double precision)).value
>>   Buffers: shared hit=35395
>>   ->  Seq Scan on calveg_whrtype_20m cv  (cost=0.00..11323.26
>> rows=2052000 width=12) (actual time=90.687..94056.706 rows=15812 loops=1)
>>         Buffers: shared hit=35395
>> Planning time: 0.213 ms
>> Execution time: 94062.215 ms
>>
>> 12.1
>>
>> HashAggregate  (cost=1088130.78..1088132.78 rows=200 width=16) (actual
>> time=44634.021..44634.031 rows=59 loops=1)
>>   Group Key: ((st_valuecount(cv.rast, 1, true, NULL::double precision[],
>> '0'::double precision))).value
>>   Buffers: shared hit=17664
>>   ->  Result  (cost=0.00..1057350.78 rows=2052000 width=12) (actual
>> time=348.651..44630.614 rows=15812 loops=1)
>>         Buffers: shared hit=17664
>>         ->  ProjectSet  (cost=0.00..10830.78 rows=2052000 width=32)
>> (actual time=348.637..44628.307 rows=15812 loops=1)
>>               Buffers: shared hit=17664
>>               ->  Seq Scan on calveg_whrtype_20m cv  (cost=0.00..47.52
>> rows=2052 width=31) (actual time=0.022..1.311 rows=2052 loops=1)
>>                     Buffers: shared hit=27
>> Planning Time: 0.381 ms
>> JIT:
>>   Functions: 12
>>   Options: Inlining true, Optimization true, Expressions true, Deforming
>> true
>>   Timing: Generation 3.299 ms, Inlining 10.618 ms, Optimization 124.105
>> ms, Emission 80.956 ms, Total 218.978 ms
>> Execution Time: 44637.499 ms
>>
>> Interesting about the stats improvement in 10. If/when I gain more
>> insight about this question, I'll keep this listed posted.
>>
>> Thanks,
>> Shira
>>
>> On Sat, Jan 11, 2020 at 3:54 AM Giuseppe Broccolo <g.broccolo.7 at gmail.com>
>> wrote:
>>
>>> Hi Shira,
>>>
>>> On Fri, 10 Jan 2020, 21:44 Shira Bezalel, <shira at sfei.org> wrote:
>>>
>>>> Hi Giuseppe,
>>>>
>>>> Thank you for your reply. I provided the EXPLAIN (ANALYZE ON, BUFFERS
>>>> ON) output in the links in my initial email. Is that not what you meant?
>>>>
>>>
>>> I was meaning the vanilla output returned by the EXPLAIN, not just the
>>> (optimal, BTW) Depesz one.
>>>
>>> Yes, the config between the two systems is very similar. I'm thinking
>>>> the shared buffers count differs between the two systems because the row
>>>> counts themselves are different. But why is that? I think I'll turn this
>>>> question over to the general Postgres Performance list now.
>>>>
>>>
>>> If the tables are identical in the two system, a different count of rows
>>> could be to a better statistical sampling, so the planner is more accurate
>>> in planning the aggregation. And this could make sense cause since
>>> PostgreSQL 10 statistics have been improved.
>>>
>>> Also, consider that even slight differences between the systems can lead
>>> to significant differences. Would be good here to understand what is
>>> different in the configuration.
>>>
>>> Anyway, please keep us updated from any insight if you find something :)
>>>
>>> Giuseppe.
>>> _______________________________________________
>>> postgis-users mailing list
>>> postgis-users at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>>
>>
>>
>> --
>> Shira Bezalel
>> Database Administrator & Desktop Support Manager
>> San Francisco Estuary Institute
>> www.sfei.org
>> Ph: 510-746-7304
>>
>>
>> _______________________________________________
>> postgis-users mailing list
>> postgis-users at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/postgis-users
>
> _______________________________________________
> postgis-users mailing list
> postgis-users at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users

-- 
Shira Bezalel
Database Administrator & Desktop Support Manager
San Francisco Estuary Institute
www.sfei.org
Ph: 510-746-7304
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-users/attachments/20200114/eadd29bd/attachment.html>