[postgis-devel] [raster] Strategizing and specifications

Bryce L Nordgren bnordgren at gmail.com
Thu Jun 30 10:55:14 PDT 2011


Here's the situation:

I've gained "notional" support for funding small chunks of postgis raster as
"infrastructure investments", since we've been using postgis for seven years
or so. My presence here for the past month has been to understand it's
current state and identify the "chunk" which will give us the most bang for
the buck. I've settled on tiled access to individual rasters as the first
"chunk".

I'd prefer if we were funding a strategy arrived at by consensus, as the
community is going to have to maintain the final result. So, rather than me
telling you guys how to achieve this goal, I'll provide a few starting
points and see what the community comes up with. Keep in mind, during this
"strategizing phase", your time is not billable. You're doing it out of the
goodness of your heart. :) However, if there is a strategy in place when I
get back, there is at least a "target" to fund.

There are a few key aspects:

The backend: select a back end for in-database rasters. The back end must
support loading part of a raster. I would prefer a strategy which leverages
the current backend (TOAST), if TOAST can be made to load a partial raster.

The API: select an API for efficient raster access. I would prefer a
strategy which leverages an existing API, preferably an API which does not
add dependencies to postgis raster. A strategy which involves developing a
GDAL driver for the selected backend would be ideal. This API does not need
to be exposed to the end user.

Uniformity and ease of maintenance: the preferred strategy would treat
in-database and out-database rasters exactly the same. This is another
preference for developing a GDAL driver for in-database rasters, since GDAL
will be used for out-database rasters. Ideally, the postgis raster codebase
(operations, accessors, etc.) will not have to be aware of the difference.

No impact on existing functionality: all existing raster capabilities should
be unaffected.

Minimal impact on existing raster code: affected functions would ideally be
limited to the various accessor functions.

Stability: seek a way to implement something "quick and dirty", but rock
solid, for the Postgis 2.0 release, such that a fuller, more comprehensive,
solution can evolve for "point releases". If this isn't possible, then it
isn't possible, but we should encourage this.

Please modify this "ideal" strategy as you see fit. Exploring options in
parallel is encouraged at this stage. If consensus is reached on a strategy
while I am gone (I'll be unavailable until July 17th), please start fleshing
out a list of the changes necessary to implement the chosen strategy on the
"future version" wiki page, as indicated by Pierre, below. Be ready to tell
me how much the changes will cost and how long it will take.

Importantly, this is not a guarantee that we'll hit the deadlines for
postgis 2.0 or my own end-of-fiscal-year lockdown. If we miss these
deadlines, we may have a "hurry up and wait" situation. But at least a plan
will be in place.

On Thu, Jun 30, 2011 at 2:16 PM, Pierre Racine
<Pierre.Racine at sbf.ulaval.ca>wrote:

> -A clear plan of rt_api.c modifications.
> -A list of the existing function being optimized with this raster
> arrangement: this is still not all clear to me.
> -A list of the existing functions that might have their performance
> affected with other arrangements if any.
> -A list of new functions allowing users to access those tiles (this is one
> of the goal no?) with their signatures.
>
> Some constraints:
>
> -This should not affect the existing architecture allowing to create a 32
> TB tiled raster coverage with small tiles as well as the performance of
> operations on this arrangement. Do we agree on that?
>

Yes.


> -You understand that you will never be able to store rasters bigger than 1
> GB in one row?
>

Yes. Bigger rasters can be divided using the existing mechanism.


> -If you want this to be part of PostGIS 2.0 you must implement it quick and
> make sure it is VERY stable before the release (September - October).
> Otherwise it will have to wait for PostGIS 3.0 in a couple of years because
> a dump of stored rasters will be necessary. (Maybe not if your modifications
> supports the present not tiled rasters?)
>

Strategizers: this is why I'd prefer the TOAST backend if possible. Can we
make a GDAL driver that understands TOAST slices?


> -We don't want to introduce a new TYPE here right? Like the Oracle RASTER
> type stored in other tables? That would be a serious drawback from my point
> of view. We are just speaking about tiling the linear array of bytes in a
> serialized raster. Right?
>

no new types in the API.


>
> Some open questions:
>
> -Would this tiling be systematic? Or would there be a minimal size for
> raster to be tiled? What is the usefulness of tiling a 100x100 tile?
>

I would suggest that we provide reasonable defaults and the ability to
adjust them when the raster's created (or loaded). This may depend on the
tiling API which is selected.


> -How are going to be able to set the size of tile inside each raster?
> Should it be the same for each raster of a table? We probably want to
> specify this at loading. There must be a clear distinction between 1) This
>  option will tile your raster over many rows and 2) This option will tile
> each raster internally
>

The terminology must be nailed down. It's possible that "internal tiles"
could be entirely transparent. The API may support loading a "tile" from an
untiled data stream.

Bryce
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20110630/cd13b56e/attachment.html>


More information about the postgis-devel mailing list