<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">

<HTML>

<HEAD>

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">

<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7654.12">

<TITLE>RE: [postgis-devel] reuse of values</TITLE>

</HEAD>

<BODY>

<!-- Converted from text/plain format -->

<P><FONT SIZE=2>Nicklas,<BR>

Interesting observation.  I never noticed that before but have noticed similar type of behavior.  At first I thought it was the SUM causing the issue, but taking that out does not narrow the gap between the two. <BR>

<BR>

Even dumping in a subselect doesn't seem to resolve the issue.  I suspect this is a pure PostgreSQL question though but would have to experiment with costly non-PostGIS functions to be sure.<BR>

<BR>

It seems you have to throw an order by in the subselect to trick the planner into materializing the subselect and once its materialized its using a stored value.  I thought perhaps setting the cost on st_distance calc to high would help but didn't.  This is probably a question for Tom Lane.<BR>

<BR>

--reduces time to 797ms frp, 1297 ms<BR>

SELECT sum(dist+1) as theFirst,sum(dist+2) as theSecond<BR>

FROM<BR>

(select st_distance(linje1, linje2) as dist<BR>

 from<BR>

 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from<BR>

 (select generate_series(1,200000,2) as a,<BR>

 generate_series(1,300000,3) as b,<BR>

 generate_series(1,400000,4) as c,<BR>

  generate_series(1,500000,5) as d) table1) table2 ORDER BY 1) As foo;<BR>

 <BR>

--reduces time to 1000ms from 1297 ms<BR>

select sum(st_distance(linje1, linje2)+1) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from<BR>

 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from<BR>

 (select generate_series(1,200000,2) as a,<BR>

 generate_series(1,300000,3) as b,<BR>

 generate_series(1,400000,4) as c,<BR>

  generate_series(1,500000,5) as d) table1 ORDER BY 1) table2;<BR>

<BR>

<BR>

--original 1297 ms<BR>

select sum(st_distance(linje1, linje2)+1) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from<BR>

 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from<BR>

 (select generate_series(1,200000,2) as a,<BR>

 generate_series(1,300000,3) as b,<BR>

 generate_series(1,400000,4) as c,<BR>

  generate_series(1,500000,5) as d) table1 ) table2;<BR>

<BR>

Thanks,<BR>

Regina<BR>

<BR>

<BR>

-----Original Message-----<BR>

From: postgis-devel-bounces@postgis.refractions.net on behalf of nicklas.aven@jordogskog.no<BR>

Sent: Fri 4/17/2009 6:46 PM<BR>

To: postgis-devel@postgis.refractions.net<BR>

Subject: [postgis-devel] reuse of values<BR>

<BR>

<BR>

Hallo<BR>

<BR>

I have a question that I don't know if it is a pure postgresql-question.<BR>

Where in the process and how, does the planner descide what values to reuse in other parts of a query.<BR>

I mean, if I do some calculation in the select-part and I want to use exectly the same calculation in the where-part. Will the value be recalculated or reused?<BR>

<BR>

in this example the querytime is almost the same for :<BR>

<BR>

select sum(st_distance(linje1, linje2)+2) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from<BR>

 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from<BR>

 (select generate_series(1,200000,2) as a,<BR>

 generate_series(1,300000,3) as b,<BR>

 generate_series(1,400000,4) as c,<BR>

  generate_series(1,500000,5) as d) table1) table2<BR>

<BR>

and<BR>

<BR>

<BR>

select sum(st_distance(linje1, linje2)+2) as theFirst from<BR>

 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from<BR>

 (select generate_series(1,200000,2) as a,<BR>

 generate_series(1,300000,3) as b,<BR>

 generate_series(1,400000,4) as c,<BR>

  generate_series(1,500000,5) as d) table1) table2;<BR>

<BR>

it takes in both cases about 600ms at my computer. this I interpret as the planer reuses the value of st_distance from theFirst to theSecond, but if I change the +2 in one of them to +1 like this:<BR>

<BR>

select sum(st_distance(linje1, linje2)+1) as theFirst,sum(st_distance(linje1, linje2)+2) as theSecond from<BR>

 (select st_makeline(st_point(a,b),st_point(b,a)) as linje1,st_makeline(st_point(c,d),st_point(d,c)) as linje2  from<BR>

 (select generate_series(1,200000,2) as a,<BR>

 generate_series(1,300000,3) as b,<BR>

 generate_series(1,400000,4) as c,<BR>

  generate_series(1,500000,5) as d) table1) table2;<BR>

<BR>

then it suddenly takes over 1000ms.<BR>

<BR>

My conclusion is that the planer knows how to reuse the whole value but not just the st_distance-calculation. So if I change some part it has to recalculate st_distance all over again. (The difference of 400ms is not from the calculation of +1 because the same query with just the st_distance-calculation without the addition-exercise also uses about 600ms, so the addition seems to be done in no time. but it distracts the planner)<BR>

Am I right?<BR>

is there any trick how to handle it in more time-consuming examples.<BR>

<BR>

<BR>

/Nicklas<BR>

<BR>

<BR>

<BR>

</FONT>

</P>

</BODY>

</HTML>

<HTML><BODY><P><hr size=1></P>

<P><STRONG>

The substance of this message, including any attachments, may be confidential, legally privileged and/or exempt from disclosure pursuant to Massachusetts law. It is intended solely for the addressee. If you received this in error, please contact the sender and delete the material from any computer.

</STRONG></P></BODY></HTML>

<P><hr size=1></P>

<P><STRONG><font size="2" color="339900"> Help make the earth a greener place. If at all possible resist printing this email and join us in saving paper. </p> <p> </font></STRONG></P>