[postgis-devel] [raster] Memory management and IO concerns

Pierre Racine Pierre.Racine at sbf.ulaval.ca
Thu Jun 30 07:16:23 PDT 2011


> 	I would put this as a future refactoring/optimization of the rt_api
> internals. But you will first have to justify why storing small tiles does not fulfill
> your application needs.
> 
> Ahh, so you're open to the possibility and wouldn't fight me if I wanted to fund
> this. :)

I'm certainly not against improving access performance for some arrangements, in this case a table of big rasters, one per row. But please we want all this to be a very transparent work. I think you should first provide (as a new objective in the future version wiki page):

-Clear justification: Why is it important to be able to store one row big images and why is it important to tile them?
-A clear plan of rt_api.c modifications.
-A list of the existing function being optimized with this raster arrangement: this is still not all clear to me.
-A list of the existing functions that might have their performance affected with other arrangements if any.
-A list of new functions allowing users to access those tiles (this is one of the goal no?) with their signatures.

Some constraints: 

-This should not affect the existing architecture allowing to create a 32 TB tiled raster coverage with small tiles as well as the performance of operations on this arrangement. Do we agree on that?
-You understand that you will never be able to store rasters bigger than 1 GB in one row?
-If you want this to be part of PostGIS 2.0 you must implement it quick and make sure it is VERY stable before the release (September - October). Otherwise it will have to wait for PostGIS 3.0 in a couple of years because a dump of stored rasters will be necessary. (Maybe not if your modifications supports the present not tiled rasters?)
-We don't want to introduce a new TYPE here right? Like the Oracle RASTER type stored in other tables? That would be a serious drawback from my point of view. We are just speaking about tiling the linear array of bytes in a serialized raster. Right?

Some open questions:

-Would this tiling be systematic? Or would there be a minimal size for raster to be tiled? What is the usefulness of tiling a 100x100 tile?
-How are going to be able to set the size of tile inside each raster? Should it be the same for each raster of a table? We probably want to specify this at loading. There must be a clear distinction between 1) This  option will tile your raster over many rows and 2) This option will tile each raster internally

I'm not sure this is all worth the amount of work required considering the good performance we have now and only to be able to store big rasters one per row... You will definitely have to be very convincing for all the points above.

I understand that some (many) people want to use PostGIS raster as a warehouse of big rasters (a series of Landsat images for example). For that you might want to store each of them one per row even if it is not necessary. But that was not the first intend of this project. The first intend is to be able to do raster/vector analysis other a consistent huge raster coverage having the same theme. If we can do both very efficiently, bingo!

(I prefer to fight with you about your inefficient modular approach to seamless operators instead ;-) But this is another discussion...)

> 	Hum no... Both ST_SetValue(newrast, 1, x, y, newval) are O(1)
> operations... Since every raster has its own georeference and you can easily
> derive the memory location of the requested pixel value in O(1). Take also into
> account the fact that those two rasters are already loaded.
> 
> The index calculations are clearly O(1). DETOASTing and RETOASTING the entire
> raster are each O(N^2) operations, I presume, and this happens with every call
> to ST_SetValue()/ST_Value(). 

In a Pl/pgSQL loop as you described each raster would be DETOASTED only once and kept into memory so that the following DETOAST would be almost instantaneous (O(1)). Unless I'm wrong. Don't underestimate PostgreSQL capacity to reuse what has been loaded into memory...

Pierre



More information about the postgis-devel mailing list