[postgis-devel] reuse of values

Obe, Regina robe.dnd at cityofboston.gov
Sat Apr 18 10:16:38 PDT 2009


Nicklas,
Interesting observation.  I never noticed that before but have noticed similar type of behavior.  At first I thought it was the SUM causing the issue, but taking that out does not narrow the gap between the two.  

Even dumping in a subselect doesn't seem to resolve the issue.  I suspect this is a pure PostgreSQL question though but would have to experiment with costly non-PostGIS functions to be sure.

It seems you have to throw an order by in the subselect to trick the planner into materializing the subselect and once its materialized its using a stored value.  I thought perhaps setting the cost on st_distance calc to high would help but didn't.  This is probably a question for Tom Lane.

--reduces time to 797ms frp, 1297 ms
SELECT sum(dist+1) as theFirst,sum(dist+2) as theSecond 
FROM 
(select st_distance(linje1, linje2) as dist
 from 
 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from
 (select generate_series(1,200000,2) as a,
 generate_series(1,300000,3) as b,
 generate_series(1,400000,4) as c,
  generate_series(1,500000,5) as d) table1) table2 ORDER BY 1) As foo;
  
--reduces time to 1000ms from 1297 ms
select sum(st_distance(linje1, linje2)+1) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from 
 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from
 (select generate_series(1,200000,2) as a,
 generate_series(1,300000,3) as b,
 generate_series(1,400000,4) as c,
  generate_series(1,500000,5) as d) table1 ORDER BY 1) table2;


--original 1297 ms
select sum(st_distance(linje1, linje2)+1) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from 
 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from
 (select generate_series(1,200000,2) as a,
 generate_series(1,300000,3) as b,
 generate_series(1,400000,4) as c,
  generate_series(1,500000,5) as d) table1 ) table2;

Thanks,
Regina


-----Original Message-----
From: postgis-devel-bounces at postgis.refractions.net on behalf of nicklas.aven at jordogskog.no
Sent: Fri 4/17/2009 6:46 PM
To: postgis-devel at postgis.refractions.net
Subject: [postgis-devel] reuse of values
 

Hallo
 
I have a question that I don't know if it is a pure postgresql-question.
Where in the process and how, does the planner descide what values to reuse in other parts of a query.
I mean, if I do some calculation in the select-part and I want to use exectly the same calculation in the where-part. Will the value be recalculated or reused?
 
in this example the querytime is almost the same for :
 
select sum(st_distance(linje1, linje2)+2) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from 
 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from
 (select generate_series(1,200000,2) as a,
 generate_series(1,300000,3) as b,
 generate_series(1,400000,4) as c,
  generate_series(1,500000,5) as d) table1) table2

and
 

select sum(st_distance(linje1, linje2)+2) as theFirst from 
 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from
 (select generate_series(1,200000,2) as a,
 generate_series(1,300000,3) as b,
 generate_series(1,400000,4) as c,
  generate_series(1,500000,5) as d) table1) table2;

it takes in both cases about 600ms at my computer. this I interpret as the planer reuses the value of st_distance from theFirst to theSecond, but if I change the +2 in one of them to +1 like this:
 
select sum(st_distance(linje1, linje2)+1) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from 
 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from
 (select generate_series(1,200000,2) as a,
 generate_series(1,300000,3) as b,
 generate_series(1,400000,4) as c,
  generate_series(1,500000,5) as d) table1) table2;

then it suddenly takes over 1000ms. 
 
My conclusion is that the planer knows how to reuse the whole value but not just the st_distance-calculation. So if I change some part it has to recalculate st_distance all over again. (The difference of 400ms is not from the calculation of +1 because the same query with just the st_distance-calculation without the addition-exercise also uses about 600ms, so the addition seems to be done in no time. but it distracts the planner)
Am I right?
is there any trick how to handle it in more time-consuming examples.
 
 
/Nicklas
 




-----------------------------------------
The substance of this message, including any attachments, may be
confidential, legally privileged and/or exempt from disclosure
pursuant to Massachusetts law. It is intended
solely for the addressee. If you received this in error, please
contact the sender and delete the material from any computer.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20090418/8d262185/attachment.html>


More information about the postgis-devel mailing list