[postgis-devel] [WKT Raster] Regular blocking in gdal2wktraster.py

Pierre Racine Pierre.Racine at sbf.ulaval.ca
Fri Mar 20 14:40:50 PDT 2009


Mateusz,

As an image is worth a thousand word, I updated the PowerPoint to better express the different raster datasets possibilities. Instead of the three categories ("continuous tiled coverage", 
"vector-like discrete coverage" and "image warehouse") I created five:

1 - As an incomplete non-overlapping tiled coverage
2 - As a complete tiled non-overlapping tiled coverage
3 - As a layer of vector like discrete raster objects
4 - As a “big raster” (or a series of)
5 - As an image warehouse

I think this is a much better representation of reality and covers 99% of our users cases.

I uploaded a PDF version in the WKT Raster Home Page. Please have a look. (Attached also)

In brief I think gdal2wktraster.py must help us to load datasets of type 1 without having to rely on other tools.

>> Let's put it like this:
>>
>> -I name a "big raster" the conventional idea we have about an image. It
>> is rectangular (or square) and you want to store all the tiles it is
>> composed of (or that you wish to create). However reality is often more
>> complex...
>
>Clear.
>
>> -A "coverage" might not be rectangular at all (because it is composed
>> of many rasters covering a terrestrial surface not necessarily
>> rectangular). Still, like big rasters users, you want to be able to
>> import it all, tiled (or retiled) into a single table without having to
>> write a batch.
>
>Clear.
>
>Now, does regular blocking concept apply only to 1st or to both cases?
>Perhaps there is misunderstanding in what regular blocking really means here.

In many cases where I will load a coverage I expect the tiles to follow a regular grid (datasets of type 2) but not necessarily a complete grid (datasets of type 1). With a lot of edge tiles not existing. I understand that if the application expect a rectangular complete grid it will have to fill the gaps (with nodata values) in a way or another.

>I understand regular blocking as a regular grid in which each cell is filled
>with a tile of raster. Later, it seems reasonable to support NULL-tiles - a cell
>of grid that does not contain any tile.
>Simply, regular blocking seems to be an optimization for less-or-more-common
>case of storing and accessing big raster data.

From my point of view and since most coverages are not necessarily rectangular (datasets of type 1) I would expect the application to fill the gap itself. Actually a not rectangular coverage should be flaged with regular_blocking = false and then the application should load the tiles one by one without expecting the first tile to be the upper left one. There could also be three options for regular_blocking:

-one for "regular and complete rectangular blocking" (datasets of type 2 and 4). Tiles are all the same size and the first one is the upper left one. This is generally the case when you store big rasters.
-one for "regular but incomplete blocking" (datasets of type 1). Tiles are all the same size and but not necessarily in a particular order. This is generally he case when you store a coverage.
-one for "don't expect any regular blocking" (datasets of type 3). Tiles might be of different size and in a any order. This is the result of a polygonization operation where the size of each tile depends on the extent of the converted feature. (one feature = one row = one raster tile). This is why there is one width and one height for each raster row (or tile) in WKT Raster.

>> >I'm following principles of prototyping, and I'm not trying to draw a
>> >big-ass plan and then start developing it. I'm making very small
>> >steps, so I can change direction quickly.
>>
>> I understand, but planning having in head that at some point we have to
>> support off-db tiles (which is one of the main feature of WKT Raster)
>> is maybe better.
>
>Possibly, however, there is plenty of work to get done for in-db rasters
>so I hope you don't mind I will stay focused on that part.
>
>BTW, you have mentioned use of GDAL in WKT Rasters. This decision seems to
>be crucial for further development. Perhaps, this is what should be
>planned sooner than later.

I plan to link with GDAL to implement:

-those three functions (rt_band_get_data(), rt_band_set_pixel() and rt_band_get_pixel())

-RT_AsPolygon() and RT_AsRaster() (and probably other functions as far as GDAL offers useful services)

-I would also link with it to implement AsGdal() (or AsImage()) to write a general image convertion function (as I wrote in the specs page) instead of relying on the Tiff and JPEG libraries to write individual AsJPEG and AsTIFF functions.

>> >My plan about gdal2wktraster is as short as this summary:
>> >
>> >1) Load 1..N raster files into a single table, one raster file per
>> >row. It's a user's decision, if he wants to load N
>> >unrelated/different/heterogeneous raster files into single table, or
>> >if he wants to keep things in some order. With current gdal2wktraster
>> >user has a liberty and tool to achieve both.
>> >
>> >2) Load 1 raster file according to principles of regular blocking we
>> >have already drawn. So, 1 raster is tiled according to block size
>> >and tiles are loaded into a single table (one tile per row).
>> >So, the whole table makes a regular grid of non-overlapping tiles.
>>
>> And what if I want to store many rasters as 2) but in the same table?
>> This is not a zillions of exceptions, this is just one more use case.
>> And as a simple user this is the case I will be encountering the most.
>
>Pierre, I'm confused.
>
>You ask for: "I want to store many rasters in the same table"
>
>Case 1) says: "Load 1..N raster files into a single table"
>
>I can't see any difference, really.
>
>First, you can store many rasters in case 1) as well (see 1..N).
>Second, in both cases, 1) and 2), all rasters/tiles are stored in a single table.
>The only difference in case 2) is that some additional restrictions apply.
>These restrictions are listed in the RASTER_COLUMNS specification
>for regular blocking case:
>- size of all tiles is equal to size of block reported for input raster
>- all tiles have the same size
>- all tiles fit common grid
>- ...

I should have said "I want to store many rasters in the same table while retiling them" as you plan in 2). In other word it is very usefull to be able to load a big raster retiled as one table of tiles but generally I will want to store many of them (datasets of type 1 and 2), still wanting to retile them as one table of tiles not necessarily globally rectangular.

>I am reading your specification document [1] and there is a section
>called "Three ways to use a WKT raster table".
>
>[1] http://www.cef-cfr.ca/uploads/Membres/WKTRasterSpecifications0.8.pdf
>
>Let's confront this slide with current version of gdal2wktraster
>and what I see is that:
>
>1. Image warehouse - it is (almost) perfectly supported now.
>It's possible to load N number of miscellaneous images into a table.
>
>2. A vector-like discrete coverage - it's also possible, isn't it?
>Similar use of gdal2wktraster.py as in point 1.
>
>3. A continuous tiled coverage - it is a relaxed version of regular blocking
>use case. However, with help from Frank and Martin, we have defined
>RASTER_COLUMNS and applied a bunch of restrictions to make this use
>case more clear and simpler to handle by client applications.
>For instance, on you slide there it's said "images may overlap",
>what is not allowed in regular blocking for well-known reasons.
>And, this is the use case I'm going to support now in gdal2wktraster.

As I designed in the new PowerPoint, 3 should actually be divided into three storage cases:

a) the resulting extent is rectangular but incomplete and tiles do not overlap. This is dataset of type 1

b) the resulting extent is rectangular and complete (all the grid cell is a tile) and tiles do not overlap. This is dataset of type 2

c) the coverage is composed of many images stored in many tables and they could overlap. This is dataset of type 4

strict regular blocking apply only for b) and c)

>Certainly, it's possible to list yet another use case which actually
>is a simplification of 1) or 2):
>
>4) Whole raster loaded into single table into single row.

This is type 5. Image warehouse.

>> WKT Raster was not planned for stupidly storing hundred of big rasters
>> as hundred of tables.
>
>Pierre, where am I suggesting anything like that?
>On the contrary, I'm emphasizing use cases in which raster is loaded
>into a single table (as a whole or tiled, but tiled in regular blocking way).
>Just to quote myself:
>
>1) Load 1..N raster files into a single table. 
>2) Load 1 raster file (...) 1 raster is tiled according to block size
>and tiles are loaded into a single table (one tile per row).

1) is mostly good for datasets of type 5. And type 1 and 2 if users have previously retiles their files using GDAL or other tools.

2) is good for datasets of type 4 as far as it only handle one raster at a time. If it would works with wildcard then it would be good for types 1 and 2 also. I think this is not a very complicated extra step for us and it adds a very usefull functionality.

>> This is a much modern usage of a
>> raster coverage. Storing many rasters representing the same variables
>> as many tables make not much sence and is not very usefull from an
>> analytical point of view.
>
>Again, I've never ever suggested "many tables".

When you store "big rasters", each new raster goes into a new table. This is the GeoRaster way. Not very useful. This is my whole point and what I am worry about.

>I am for coverages and this is exactly what I'm working on now.
>Regular blocking is a coverage and it represents one of use cases you have
>drawn in the specification (PDF), with some extra restrictions,
>which actually do not change the nature of the use case at all.

I'm sorry if theses drawing were misleading. Reality is a bit more complex. 

>However, I'd like to add:
>
>- I haven't said I'm against off-db rasters. I have only said that
>off-db is not the part of work I can pick up, but raster blocking
>support *is* the part of work I can. So, I'm focused on this particular
>features which *is* included in the specification, is on the roadmap.

Understood.

>- (or remind) gdal2wktraster was started as a prototype,
>a proof of concept, before development of fully-featured
>raster2pgsql utility is started. gdal2wktraster is more a
>complementary feature, isn't it? And, I believe I have a
>liberty to not to want to develop support of off-db rasters
>in this very simple tool. If anyone wants to do it, great,
>feel free to contribute it. But please, don't blame me
>because I'm not particularly interested in adding off-db
>support to gdal2wktraster.py.

Ok. My apologies. I think we will stick with gdal2wktraster.py for a while as it would take a huge amount of time to build the equivalent in C.

As far as blocking is not implemented in gdal2wktraster.py and that users create their tiles themself, off-db is fully supported by the single -R option.

As soon as gdal2wktraster.py creates the tiles and do not write them to a folder in the filesystem, -R is broken since It can't write the path in the stored band.

I would suggest that when regular blocking is activated with the -R option, we return an error "raster registration is not available with regular blocking".

I think gdal2wktraster.py should have
-source options
	-one file or 
	-a series determined with wildcards
-process options
	-I index
	-t blocking (oir tiling)
	-N nodatavalue
	-R register
	-s
-destination options (available only when sourse uses wildcard i.e. many source files)
	-c/d
	-t 
	-X (each files in a new table. Otherwise all the tiles goes in the same table)
	
>- I believe our goals are very well aligned to the WKT Raster
>specification. Please correct me if I'm wrong, and where.

I think you focus on type 4 and I would prefer to focus on types 1 and 2.

Pierre
-------------- next part --------------
A non-text attachment was scrubbed...
Name: WKTRasterSpecifications1.0.pdf
Type: application/octet-stream
Size: 2579607 bytes
Desc: WKTRasterSpecifications1.0.pdf
URL: <http://lists.osgeo.org/pipermail/postgis-devel/attachments/20090320/d080d740/attachment.obj>


More information about the postgis-devel mailing list