[postgis-users] How to handle tiled rasters?

Fri Jan 7 07:06:13 PST 2011

Stefan,

I'm back from vacation and I think I owe you some explanations/justifications...

>The most important (conceptually motivated) features I'm after are
>1. Keeping the user away from performance decisions as long as
>possible (thus e.g. the "tiled/not tiled" field in metadata).

>2. Being able to express the fact that the modeled coverage is
>overlapping or not.

Please explain me what performance gain an application does by knowing that a set of tile are regularly tiled or not. I have been asking for this since a while and nobody ever dare to explain me.

For me a GIS application has two choices:

1) Display many rasters independently. Many raster = many layer = many themes = many symbologies (even if the theme and the symbology are generally the same). In this case what an application wants to do is to query all the tiles (or rasters) forming one raster, display them as a single image and do the same for each image.

2) Display many rasters as a unique coverage. Many raster = one layer = one theme = one symbology. In this case what an application wants to do is to query all the tiles (or rasters) of the coverage to be displayed in the display and display them one after the other, independently (since it can not rely on the order or the location).

1 is what I call the "big image" use of rasters. This is still the way ArcGIS display images for example (in v. 10 they introduce mosaics. Wow! Ten years too late). From my point of view, its a shame that so many GIS application still use images this way.

2 is what I call the coverage use of rasters. This is very and much more similar to the way applications treat vector coverages. In my opinion, and considering the raster datasets now available on the web this is a much more practical and modern way to use rasters. 1 was when there was no preprocessed raster coverages available and people had to build their own mosaics of raster. But reality has changed and now, more and more, people use preprocessed raster coverages and those coverages are not necessarily rectangular areas (I can have a complex polygonal composition of thousands of non-overlapping rasters covering the region of Quebec for example).

I view 2 as a more flexible generalisation of 1 in that if you are able to deal with 2, you are also able to deal with 1. 1 also becomes identical to 2 if there is only one image involved. If there are many then it's just many times case 2. The only difference (and I think the source of the problem) is that application 1 will try to fit all the received tile into a big raster. If those applications would instead behave like 2 (displaying tile one after the other where it has to be displayed) (this is the way web applications works) treating any table of tile like 2 would always work without knowing if a table is tiled or not.

What if two images overlap? The application should just propose a rule: Display: FIRST pixel value, LAST pixel value, MAX of pixel values, MIN of pixel values, MEAN of pixel values.

I think much of the confusion comes from the fact that we are very much used to treat rasters like 1 and to expect one raster table to form a nice rectangular images. It's an error to see a PostGIS raster table as such an image format. You must see a PostGIS raster table as way to store rasters, period, exactly like a filesystem is a way to store files. So yes an application must be able to deal with any raster arrangement found in a table and 2 is the most generic way to do this.

Now the geometry analogy: Applications do not need to know if geometries from a PostGIS geometry layer overlap or not or are topological or not. They just display them one after the other. You must see a single tile in PostGIS raster as you see a single geometry. Applications should just do the same as when they disply geometries and display them one after the other. Again this is what web applications do and everybody is happy.

I should actually avocate stronger that a PostGIS raster table IS NOT an image. It is a bunch of raster (or call them tiles). Period. Display them as if they were independent rasters. Pediod. The same way a geometry table is a bunch of geometries. Do they form a rectangle? Is there only one or many? Do they overlap? Are they topological? There is no flag in PostGIS to indicate that, nobody ever complained and applications works very well. This is because we do not assume any interdepend structure between the geometries. We should do the same tith rasters stored in PostGIS raster.

>Anyway, to me characteristic (d) "rectangular regularly tiled raster
>coverage" from http://trac.osgeo.org/postgis/wiki/WKTRaster/Documentation01
>should be supported in a simple and efficient way as possible.
>
>Regarding 2.
>Having only one single raster file (as input) is not the main use case
>I have in mind. Often there are several raster files. Putting one
>single large raster image into one row of a raster table seems to me
>like storing the whole coastline of Finland as one row into a vector
>table (which I wouldn't). Another use case are completely independent
>(probably overlapping and hopefully not too big) X-ray images or
>holiday pictures. All this would be stored as separate rows in a
>single raster table - but there is a difference, isn't it?

If you you have many very big images representing different theme and you want them tiled, there is two solutions:

1) If you consider your rasters objects to be representing very different entities, store them in different tables.

2) If you consider your rasters objects to be representing very similar entities, store all the tiles in the same table with an external key and create an Oracle Georaster like relational schema with a table listing the individual images.

PostGIS was done for geospatial applications (even if it can be used for other applications) and a geospatial table is generally composed of one theme to be stored in on table. Thus implying a solution different and simpler than 1 and 2.

>Regarding 1.
>* To me one single raster table represents part of or at most one
>coverage (i.e. I would'nt put many conceptually different coverages in
>one table).

We agree on this.

>* Drawing analogies to vector geometries: There too the index cares
>about lines that are "chopped" into "segments".

What do you mean?

>* It would be a conceptual asymmetry to me if users (when querying)
>don't have to care about vector index options nor about vector/raster
>representation, but had to keep in mind if the raster they are
>querying is tiled or non-tiled.

Considering what I explained at the beginning, application should not have to know if the coverage is tiled or not. Just display one tile at a time. Don't try to write then in a big raster structure. This is why I have always been against this regular_blocking flag. Vector applications do not need this and works very well.

>If tiled/not tiled can be deduced then it's ok IMHO; but this must be
>reliable. If not, then why maintain raster_columns anyway?

You can never garantee 100% synch of the raster_columns table.

>I would be the first to get rid of this metadata (being neither a user
>nor a postgres system table). I actually really prefer
>Populate_Geometry_Columns() over AddRasterColumn() and I hope there
>will be an analagous function fo raster (probably called
>Populate_Raster_Columns).

We can add this to our task list.

Hope this explain the why, pros and cons of some decisions.

Pierre