[postgis-devel] [raster] Memory management and IO concerns

Paul Ramsey pramsey at opengeo.org
Thu Jun 23 15:35:14 PDT 2011


Once you DETOAST_DATUM, it's loaded in. However you do have the option
of just doing DETOAST_DATUM_SLICE. Obviously, this begins to
complicate things :) however where performance is paramount, it's
worth doing. The index code detoasts just the first slice of a
gserialized that contains the bounding box, for example.

P.

On Thu, Jun 23, 2011 at 3:20 PM, dustymugs <dustymugs at gmail.com> wrote:
> Hey Bryce,
>
> On 06/23/2011 02:20 PM, Bryce L Nordgren wrote:
>>
>> A nasty thought was nagging at me the entire time I was writing raster
>> code.
>> It mostly has to do with "how many copies of this raster am I holding in
>> memory at the same time". This arises from the basic flow:
>>
>> 1] Get a (rt_pgraster*) from the PG_FUNCTION_ARGS
>> 2] Get a rt_raster by deserializing the (rt_pgraster*)
>> 3] Get a GDALDatasetH by opening the rt_raster using the "mem" driver.
>>
>> So I poked around and #1 is a hollow shell with just a few metadata
>> fields.
>> #2 is "all or nothing": either you get just the header metadata or you
>> load
>> all the data from all the bands into memory.  I hope (but don't know) that
>> #3 simply manipulates the data/metadata in the rt_raster/rt_band buffer
>> (e.g., hope that it doesn't make ANOTHER COPY.)
>>
>
> Assuming you're using rt_raster_to_gdal_mem, the band data in GDALDatasetH
> just points to the appropriate address of the band data in rt_raster.
>
> I've been thinking about expanding rt_raster_deserialize to allow more than
> the "all or nothing" approach with a selective list of the bands to
> deserialize, but isn't at the top of my todo right now.  Adding it wouldn't
> make more than an hour or two of work though.
>
>> I was worried because I have three rasters active for the duration of my
>> call (two inputs and one output), two of which have open GDALDatasetH
>> handles, and I could foresee running out of memory pretty quickly.
>> However,
>> this may have serious performance implications for anything written in
>> SQL...for IO reasons instead of memory reasons.  If you take the core loop
>> of the mapalgebra code as typical (and it should be, for anything that
>> loops
>> over all the cells), you have:
>>
>>         FOR x IN 1..newwidth LOOP
>>>
>>>             FOR y IN 1..newheight LOOP
>>>                 r1 := ST_Value(rast1, band1, x - rast1offsetx, y -
>>> rast1offsety);
>>>                 r2 := ST_Value(rast2, band2, x - rast2offsetx1, y -
>>> rast2offsety1);
>>>                 ---- insert code here
>>>
>>>                 newrast = ST_SetValue(newrast, 1, x, y, newval);
>>>             END LOOP;
>>>         END LOOP;
>>>
>> Each call to ST_Value equals "loading all data from all bands into
>> memory".
>> Each call to ST_SetValue equals "loading all data from all bands into
>> memory, then saving everything back out to the postgres backend". That's a
>> LOT of I/O to read two pixels and set one. Also, if you assume a square
>> raster (or any fixed aspect ratio), this is an O(N^2) situation, where N
>> is
>> the length of one of the dimensions. When things start getting slow,
>> they'll
>> screech to a halt. Worse: the PostGIS backend will act as a bottleneck, so
>> two separate processes operating on different rasters will likely be
>> waiting
>> on the same disk for their IO.
>>
>> It would seem that adding the ability to "deserialize" a partial raster
>> has
>> the potential to vastly improve performance from SQL (by fixing an IO
>> issue), and to a lesser extent, C (by fixing a memory issue). It would
>> also
>> allow larger rasters to be manipulated/operated on.
>>
>> Is it possible to implement a server-side GDAL driver for rasters stored
>> in
>> PostGIS? GDAL seems to have the ability to load/store blocks of image data
>> (perhaps even a "block cache", if I saw correctly?) What is the current
>> thinking on this issue?
>>
>
> I can't be certain but the rt_pgraster object received in the backend is
> already in memory from PostgreSQL, so there isn't much memory savings
> possible.  The ability of GDAL for interacting with blocks of data is of
> benefit when you're working against file rasters on the filesystem that
> aren't loaded into memory and can be accessed as a stream.
>
> -bborie
> _______________________________________________
> postgis-devel mailing list
> postgis-devel at postgis.refractions.net
> http://postgis.refractions.net/mailman/listinfo/postgis-devel
>



More information about the postgis-devel mailing list