[postgis-devel] reuse/caching of values in a query

nicklas.aven at jordogskog.no nicklas.aven at jordogskog.no
Wed Jul 8 12:56:13 PDT 2009


Hallo

I would like to raise this issue again. I know the problem is mostly at the postgresql side of the fence, but maybe we can find some workaround or get postgresql-developers to look at it. There were some discussions in the springtime where me and Regina saw some, little bit strange behavior from the planner.

The issue is about when the same calculation is used many times in the same query.
an example:

select st_area(the_geom) from table1 where st_area(the_geom) > 10000;

the area seems to be calculated twice.

another example:
select st_area((st_dump(the_geom)).geom), (st_dump(the_geom)).path from table1;

this query takes about the double querytime compared to:

select st_area((st_dump(the_geom)).geom), '{}' from "GSDmarkp"
i added '{} just to get the same output so that part is constant.

This means as I understand it that st_dump have to do the job twice for each row.

I think this is quite a big problem. In complex queries the same iterations and calculations will be done many times. Like the example of st_dump, all the information is there the first time, but there is no way to get it out.
In some cases it is not even possible to use a subquery to make it u the calculation just once.
like this:
select area from 
(select st_area(the_geom)::int as area from table1) a
where area >10000

Here I think it starts to look like a bug. Why is it calculating the area twice here?

Why I think this is an important question:
I'm looking at the distance-functions. The distace-calculations is probably one of the more power consuming functions in postgis because it has to iterate through all the vertexes in both geometries so two geometries with 1000 vertexes each will generate 1 milj iterations. When we do this iterations it is very tempting to get more information at the same time. It would be almost as fast to get maxdistance, mindistance, shortestline, longestline as just get mindistance. All the calculations is already done it is just a matter of catching the results. I guess there would also be no problem to output the result the same way as st_dump outputs two different results, geom and path. In this case we could have a function that results all the distance-calculations as an array. The only problem is to get it out without doing the same thing again and again. 

Regina and Leo wrote a greate blog about it a few months ago. 
http://www.postgresonline.com/journal/index.php?/archives/113-How-to-force-PostgreSQL-to-use-a-pre-calculated-value.html
but there has been no bigger reaction. 
I guess it shouldn't be to impossible when I have understood they can cache values between different rows, then it would be possible to reuse them in the same row.

I'm I wrong thinking it is an important matter for postgis as a calculation-intence part of the postgresql-world?

Greetings
Nicklas







-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20090708/4160216f/attachment.html>


More information about the postgis-devel mailing list