[postgis-devel] Speeding up ST_Intersects(raster, geometry)
Pierre Racine
Pierre.Racine at sbf.ulaval.ca
Tue May 11 14:06:40 PDT 2010
Regina, Jorge, Mat,
I'm writing ST_Intersects(raster, geometry) and ST_Intersection(raster, geometry). ST_Intersects(raster, geometry) should be defined like this:
CREATE OR REPLACE FUNCTION ST_Intersects(geometry, raster, integer) RETURNS boolean AS $$
SELECT $1 && $2 AND st_intersects($1,st_convexhull($2)) AND
(NOT st_bandhasnodatavalue($2, $3)) OR st_intersects($1, st_polygon($2, $3)))
$$ LANGUAGE 'SQL' IMMUTABLE;
Hence, a geometry intersects with a raster if:
1) their bounding boxes intersect (quick)
AND
2) the rotated box of the raster and the geometry intersect (quick)
AND
3) there is not a nodata value defined for the band (quick) or the geometry intersects intersects with the withvalue area (slow)
The problem goes like this: if there is a nodatavalue defined for the whole raster and the geometry intersects with the bounding box of a tile (and the rotated box) we must compute the intersect between the geometry and the polygonization (st_polygon) of the withvalue area in order to make sure the geometry significantly intersect with the tile. However, up to now, st_polygon is very slow because it is the st_union of all the polygonized part of the tile. This makes ANY intersect test between a raster having a significant nodata value and a geometry intersecting its bounding box (or its rotated box) way too slow for nothing since even if there is no pixel with nodata value, we have to polygonize all the tile anyway to end up with a geometry identical to the envelope (which is itself VERY fast to compute). Here are some solutions to this problem. Let me know your feelings:
1) We could make st_polygon faster by implementing a new smarter polygonize function in GDAL that would only return the polygon associated with a specific value. In this case we could polygonize only the nodata value and make the test with the difference between the envelope and this nodata geometry (the result is the withvalue area). Unformtunately this would still be way too slow. I'm not even sure this "smarter" polygonize function could be faster than the existing one since it would probably have to parse every pixel anyway. I don't know if constructing only one geometry is much faster than constructing all the geometries resulting from the polygonization. Probably not much enough anyway.
2) We could generally just always IGNORE the nodata values, threat them as real data values and let the responsability to reject a test or a resulting geometry to the application. (I don't really like this idea...)
3) We could add a ContainsNoDataValue to each tiles in addition to the HasNodataValue one. (In this case HasNodataValue lose a bit of its meaning and we should probably rename it TrueNoDataValue.) TrueNoDataValue would mean "the stored nodata value is a true nodata value". ContainsNoDataValue would mean "there are some pixels with nodata value in this raster or tile. In this case we can use this flag in the st_intersect test and compute the st_polygon ONLY if the raster ContainsNoDataValue. We have only two free slots for flags like this remaining, but at the same time they are there to fix this kind of cache problem.
4) We could redefine (since it has not been really defined yet) HasNodataValue to "the stored nodata value is a true nodata value AND the raster contains at least one of them. That means some tiles would have the flag set and some others not. After all, what is the usefulness of knowing that the nodata value is a true nodata value if there are none in the raster?... In this case I would have just to change gdal2wktraster.py and ST_SetBandHasNoDataValue() (and maybe add a trigger to ST_SetBandNodataValue()) so they set the flag of each tile properly.
I hesitate between 3 and 4... 4 is minimal but maybe too minimal and make us lose some important information. 3 seems to give too much importance to the nodata value problem.
Pierre
More information about the postgis-devel
mailing list