[postgis-devel] [WKT Raster] Regular blocking in gdal2wktraster.py
Chris Hodgson
chodgson at refractions.net
Fri Mar 27 11:50:32 PDT 2009
Just so you understand where I'm coming from, I was involved in the
pre-existing raster support in Postgis and developing tools based on
GDAL and scratch-built to do useful things with it. Particularly
building, pyramidding, storing, and managing DEM data for north america
at multiple resolutions to build 3-d terrain models on the fly from data
stored in the "old" Postgis "chip" raster type. I have also been
involved in some fairly significant raster processing involving
mosaicking an orthophoto coverage for the province of British Columbia.
So again, at this point I'm not in a position to contribute to the
actual development so I'm just offering my opinions, as I think my
experience with some of the possible use cases may be of value.
> I might want to create smaller tiles because:
>
> -the size of my source images do not fit the blocking size I want in the DB AND
> -I will afterward want to append more tiles that would overlap if I fill small tiles with nodatavalues AND
> -my application does not care about regular blocking
>
If the data you add afterward overlaps with the existing tiles, it
should be written into those existing tiles. Whatever script you run to
do this loading should have have options to specify if you only
overwrite existing no-data values or if you should overwrite everything.
This supports the use case of building a "big raster" out of a set of
potentially overlapping, irregular input images - for example,
mosaicking orthophotos or satellite imagery.
>> If you do create small tiles, then it creates additional problems when
>> you later want to add to your raster with another image file ... either
>> the smaller bits stay there and throw off the relative tiling of the
>> next pieces,
>>
>
> No, since every tiles is georeferenced. The next image tiles will not throw off anything.
>
> or you must remove the existing little tiles and copy their
>
>> data to larger "regular size" tiles in order to append to the raster.
>>
>
> This is just if you need to stick to regular blocking.
>
I understand how geo-referencing works. But yes, we are really saying
nothing here - if we allow irregularity, we will have irregularity - my
argument is that allowing it does not simplify anything, and it
definitely hurts performance.
>
> In WKT raster each row is an independent raster (contrary to Oracle GeoRaster). We support non-regular blocking (i.e. variable-size tiles) to efficiently rasterize geometry columns (one feature = one raster) and easily implement vector/raster operations. We also have constraints to ensure regular blocking in case users need it. We support both worlds because both worlds have pros and cons for specific applications.
>
Ok, I think I understand the issue now. You want to be able to treat the
raster results of operations in the same way as any other raster (aka.
tile). In this case it seems to me that you need to be able to allow
overlapping tiles as well, as the results of "select
rasterize(line_column) certainly have the possibility of overlapping -
the result being that you don't have a single raster, you have a
collection of them, one for each of the original features you had. If I
want to overlay a collection of features onto my set of regular raster
tiles, the result would still be regularly tiled.
I guess I can't think of any operations that would result in a single
raster with variable-size tiles, assuming you force all raster data to a
regular tile-size on input.
> From my point of view regular blocking is optional. It is a specific way of a more general way to store tiles. Does applications like Google Earth profit from regular blocking? Why is it better to query for regular blocked tiles in chunks than just querying for tiles with a proper SQL query and display them as independent raster forming a layer? This is the way vector layer works and nobody complains.
>
Google Earth absolutely benefits from regular blocking, because of how
it simplifies data retrieval and caching. Locality of reference is the
reason why it is better to request tiles as they are, rather than
request the database resize/crop them to your request. A truth common to
both mapping and analysis is that you are very likely to request the
data nearby the data you just requested. Why ask for only half a tile,
when if you knew that you were requesting half you would have asked for
the whole thing - because your next request is for the other half of the
same tile.
And actually, I do have a complaint about the current vector layer
indexing in Postgis - currently the R-tree graph created by indexing is
so random that clustering on it has no useful effect. Ideally there
would be a way to pack the index (creating an "ideal" R-tree for the
data currently in the table) so that clustering on it would provide the
same benefits of locality of reference that you can get by clustering
your tables on other index types. Various packing routines are known for
R-trees, but implementing them in the database is unfortunately very
difficult - I've been looking into it.
Not sure if you are aware of the concept of clustering - the idea is to
physically order the data in the table in the same order as it is in the
index. In this way, the results of bbox-based queries are likely to
actually be next to each other on the hard disk - meaning fewer blocks
have to be read from disk by the database and increasing performance
significantly. I should note that this only works if you aren't actively
changing the data as this breaks up the clustering and hurts the
performance - but often your source data for analysis is not changing,
nor is your production map-producing data - or at least they change
rarely enough that the cost of rebuilding the index and clustering is
worth it for the performance benefit.
Chris
More information about the postgis-devel
mailing list