[postgis-users] Much Faster Point in Polygon Count using KEYWORD LATERAL
Paragon Corporation
lr at pcorp.us
Fri Mar 13 21:29:23 PDT 2015
Mark,
See below comments.
> Thanks Regina for your reply - some of these nuances are rather new to me,
so I'll work through each of your points, sharing with you the query planner
results. It seems the GroupAggregate cost is the root cause of the
performance difference.
>> It would be interesting to see what happens if you experiment with
disabling
>> certain strategies. I think a non-LATERAL has a lot more options as to
the
>> approach it takes e.g. nested vs. hash, group agg etc where as the
LATERAL I
>> think goes by a nested loop strategy. So it might be partly the plan it
>> takes as well and you may achieve similar performance benefits by say
>> disabling group_agg.
>> How do you explicitly disable groupagg? I couldn't find any reference -
is it a case of SET enable_hashagg=ON ? And setting ample working memory?
>> SET enable_hashagg=ON
> >SET work_mem = '1000MB';
I misspoke on this one, I guess groupagg is not one you can disable, but yes
that was along the lines of what I was thinking and best you can do as you
did is increase work_mem to sway it in the hash_agg direction
Or possible group by an additional key (like gid or some other non-spatial
column)
> Poorly worded on my part. I did read somewhere that GIST indexes didn't
have any effect on LEFT JOINs - possibly the world has moved on, but I still
see this message:
> NOTICE: gserialized_gist_joinsel: jointype 1 not supported
> This message doesn't appear on INNER JOIN - only LEFT JOIN
> Query times are identical which suggest the NOTICE is superfluous.
Yah that message has existed since dawn of PostGIS and upsets everyone. I
ran into it and got scared to, but as you surmised it's a harmless notice.
We just don't have any specific selectivity for LEFT JOIN so it always uses
the same approach as INNER JOIN even if that might not be the best.
What we don't support is FULL JOIN (last I bothered trying, I don't think
you can even DO a FULL JOIN with a spatial construct in the ON clause).
That I've been waiting for for 10 years until I decided I could live without
it and stopped complaining about it. FULL JOIN is a rare case I come across
every once in a while but hasn't bugged me recently.
>> 4) The LEFT JOIN behavior you are getting with doing count in your
LATERAL
>> query is a side-affect of aggregation and not really the way you'd
approach
>> a LEFT JON in LATERAL general case.
> I see the nuance. INNER JOIN it should be :-)
INNER JOIN and CROSS JOIN (aka just , ) with LATERAL are exactly the same.
I sometimes write it as a CROSS JOIN and , just to save myself the annoyance
of having to specify an unneeded ON clause.
LEFT JOIN/ RIGTH JOIN is the only case where you have to explicitly write
LEFT JOIN (and of course have to include an ON when you do that since the
SQL specs require it)
Unless there's another way of avoiding the performance cost of GROUP BY,
then the query times of INNER JOIN LATERAL seem really compelling...
Anyway great to see you are doing performance checks on these. I was
meaning to particularly setting up a benchmark of regression tests to make
sure performance of PostGIS doesn't deteriorate as we add new features.
That's something we are pretty weak at these days and rely to much on user
comments (which is a bit too late) when these issues arise. I'm hoping to
do that come before PostGIS 2.2. release especially since strk and pramsey
are doing a lot of work with optimizer stats and new spgist type.
Thanks,
Regina
http://www.postgis.us
http://postgis.net
More information about the postgis-users
mailing list