[postgis-devel] [WKT Raster] Regular blocking in gdal2wktraster.py

Fri Mar 27 11:50:32 PDT 2009

  Just so you understand where I'm coming from, I was involved in the 
pre-existing raster support in Postgis and developing tools based on 
GDAL and scratch-built to do useful things with it. Particularly 
building, pyramidding, storing, and managing DEM data for north america 
at multiple resolutions to build 3-d terrain models on the fly from data 
stored in the "old" Postgis "chip" raster type. I have also been 
involved in some fairly significant raster processing involving 
mosaicking an orthophoto coverage for the province of British Columbia.

So again, at this point I'm not in a position to contribute to the 
actual development so I'm just offering my opinions, as I think my 
experience with some of the possible use cases may be of value.
> I might want to create smaller tiles because:
>
> -the size of my source images do not fit the blocking size I want in the DB  AND
> -I will afterward want to append more tiles that would overlap if I fill small tiles with nodatavalues AND
> -my application does not care about regular blocking
>   
If the data you add afterward overlaps with the existing tiles, it 
should be written into those existing tiles. Whatever script you run to 
do this loading should have have options to specify if you only 
overwrite existing no-data values or if you should overwrite everything. 
This supports the use case of building a "big raster" out of a set of 
potentially overlapping, irregular input images - for example, 
mosaicking orthophotos or satellite imagery.
>> If you do create small tiles, then it creates additional problems when
>> you later want to add to your raster with another image file ... either
>> the smaller bits stay there and throw off the relative tiling of the
>> next pieces, 
>>     
>
> No, since every tiles is georeferenced. The next image tiles will not throw off anything.
>
> or you must remove the existing little tiles and copy their
>   
>> data to larger "regular size" tiles in order to append to the raster.
>>     
>
> This is just if you need to stick to regular blocking.
>   
I understand how geo-referencing works. But yes, we are really saying 
nothing here - if we allow irregularity, we will have irregularity - my 
argument is that allowing it does not simplify anything, and it 
definitely hurts performance.
>
> In WKT raster each row is an independent raster (contrary to Oracle GeoRaster). We support non-regular blocking (i.e. variable-size tiles) to efficiently rasterize geometry columns (one feature = one raster) and easily implement vector/raster operations. We also have constraints to ensure regular blocking in case users need it. We support both worlds because both worlds have pros and cons for specific applications.
>   
Ok, I think I understand the issue now. You want to be able to treat the 
raster results of operations in the same way as any other raster (aka. 
tile). In this case it seems to me that you need to be able to allow 
overlapping tiles as well, as the results of "select 
rasterize(line_column) certainly have the possibility of overlapping - 
the result being that you don't have a single raster, you have a 
collection of them, one for each of the original features you had. If I 
want to overlay a collection of features onto my set of regular raster 
tiles, the result would still be regularly tiled.

I guess I can't think of any operations that would result in a single 
raster with variable-size tiles, assuming you force all raster data to a 
regular tile-size on input.
> From my point of view regular blocking is optional. It is a specific way of a more general way to store tiles. Does applications like Google Earth profit from regular blocking? Why is it better to query for regular blocked tiles in chunks than just querying for tiles with a proper SQL query and display them as independent raster forming a layer? This is the way vector layer works and nobody complains.
>   

Google Earth absolutely benefits from regular blocking, because of how 
it simplifies data retrieval and caching. Locality of reference is the 
reason why it is better to request tiles as they are, rather than 
request the database resize/crop them to your request. A truth common to 
both mapping and analysis is that you are very likely to request the 
data nearby the data you just requested. Why ask for only half a tile, 
when if you knew that you were requesting half you would have asked for 
the whole thing - because your next request is for the other half of the 
same tile.

And actually, I do have a complaint about the current vector layer 
indexing in Postgis - currently the R-tree graph created by indexing is 
so random that clustering on it has no useful effect. Ideally there 
would be a way to pack the index (creating an "ideal" R-tree for the 
data currently in the table) so that clustering on it would provide the 
same benefits of locality of reference that you can get by clustering 
your tables on other index types. Various packing routines are known for 
R-trees, but implementing them in the database is unfortunately very 
difficult - I've been looking into it.

Not sure if you are aware of the concept of clustering - the idea is to 
physically order the data in the table in the same order as it is in the 
index. In this way, the results of bbox-based queries are likely to 
actually be next to each other on the hard disk - meaning fewer blocks 
have to be read from disk by the database and increasing performance 
significantly. I should note that this only works if you aren't actively 
changing the data as this breaks up the clustering and hurts the 
performance - but often your source data for analysis is not changing, 
nor is your production map-producing data - or at least they change 
rarely enough that the cost of rebuilding the index and clustering is 
worth it for the performance benefit.

Chris