[postgis-devel] Speeding up ST_Intersects(raster, geometry)

Wed May 12 07:32:08 PDT 2010

>I would think it would be easy enough to figure out the the row and column
>ranges you need to spot check based on the convex hull alone.

The surface covered by a geometry is rarely defined by a simple set of row and column. We need to define a complex zone which in the end might be defined by many sets of rows and columns. In the case where the raster has a nodata value defined but there is no pixel with nodata value in the raster tile this trick is still much slower than proposal 3 and 4...

>Then when you find one pixel in that range that is not nodatavalue  -- you
>return true.

Two things are new and interesting in this proposal:

-we can optimize st_intersect further by limiting the st_aspolygon to the commun extent between the geometry and the raster. This way we probably often gain lots of polygonisation processing (or whatever processing we need to do).

-it is probably be more efficient to scan this zone searching for withdata values than to polygonize it

So what I see to implement this is some king of plpgsql function that would:

1) intersect the envelope of the raster with the geometry (fast).

2) rasterize (planned ST_AsRaster(geometry)) this intersection into a new raster aligned on the original raster (this raster acts as a mask and is a way to construct the complex sets of rows and columns covering the surface of the geometry) (should be fast enough).

3) add a band to this raster containing the values of the original raster (this looks like a raster/raster intersection) (should be fast).

4) scan this two bands in parallel looking for withdata values in the original raster where there is a withvalue pixel in the mask. As soon as one is found, return true (fast enough).

Would this be faster than polygonizing the whole (or just the shared part) of the raster? I really have no idea... For sure this implies the implementation of a simple version of ST_AsRaster(geometry).

Another approach (which correspond probably more to what you have in mind) is:

1) intersect the geometry with a grid of point representing the pixel centroids of the raster (fast enough)

2) scann each of the resulting points for the presence of a withdata value using the upcoming ST_Value(point) (I don't know the speed of ST_Value)

This is certainly much easier/faster to implement. Is it faster?

In the end I dont think this prevent us from implementing some sort of caching for the presence/absence of nodata values in the raster (proposal 3 or 4). Which proposal do you prefer? The same way at some point we will have to cache some statistics of the values (max, min, mean) (and upgrade the WKB specification).

Pierre

>-----Original Message-----
>From: postgis-devel-bounces at postgis.refractions.net
>[mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of Pierre
>Racine
>Sent: Tuesday, May 11, 2010 5:07 PM
>To: postgis-devel at postgis.refractions.net
>Subject: [postgis-devel] Speeding up ST_Intersects(raster, geometry)
>
>Regina, Jorge, Mat,
>
>I'm writing ST_Intersects(raster, geometry) and ST_Intersection(raster,
>geometry). ST_Intersects(raster, geometry) should be defined like this:
>
>CREATE OR REPLACE FUNCTION ST_Intersects(geometry, raster, integer) RETURNS
>boolean AS $$
>    SELECT $1 && $2 AND st_intersects($1,st_convexhull($2)) AND
>        (NOT st_bandhasnodatavalue($2, $3)) OR st_intersects($1,
>st_polygon($2, $3)))
>        $$ LANGUAGE 'SQL' IMMUTABLE;
>
>Hence, a geometry intersects with a raster if:
>
>1) their bounding boxes intersect (quick)
>
>AND
>
>2) the rotated box of the raster and the geometry intersect (quick)
>
>AND
>
>3) there is not a nodata value defined for the band (quick) or the geometry
>intersects intersects with the withvalue area (slow)
>
>The problem goes like this: if there is a nodatavalue defined for the whole
>raster and the geometry intersects with the bounding box of a tile (and the
>rotated box) we must compute the intersect between the geometry and the
>polygonization (st_polygon) of the withvalue area in order to make sure the
>geometry significantly intersect with the tile. However, up to now,
>st_polygon is very slow because it is the st_union of all the polygonized
>part of the tile. This makes ANY intersect test between a raster having a
>significant nodata value and a geometry intersecting its bounding box (or
>its rotated box) way too slow for nothing since even if there is no pixel
>with nodata value, we have to polygonize all the tile anyway to end up with
>a geometry identical to the envelope (which is itself VERY fast to compute).
>Here are some solutions to this problem. Let me know your feelings:
>
>1) We could make st_polygon faster by implementing a new smarter polygonize
>function in GDAL that would only return the polygon associated with a
>specific value. In this case we could polygonize only the nodata value and
>make the test with the difference between the envelope and this nodata
>geometry (the result is the withvalue area). Unformtunately this would still
>be way too slow. I'm not even sure this "smarter" polygonize function could
>be faster than the existing one since it would probably have to parse every
>pixel anyway. I don't know if constructing only one geometry is much faster
>than constructing all the geometries resulting from the polygonization.
>Probably not much enough anyway.
>
>2) We could generally just always IGNORE the nodata values, threat them as
>real data values and let the responsability to reject a test or a resulting
>geometry to the application. (I don't really like this idea...)
>
>3) We could add a ContainsNoDataValue to each tiles in addition to the
>HasNodataValue one. (In this case HasNodataValue lose a bit of its meaning
>and we should probably rename it TrueNoDataValue.) TrueNoDataValue would
>mean "the stored nodata value is a true nodata value". ContainsNoDataValue
>would mean "there are some pixels with nodata value in this raster or tile.
>In this case we can use this flag in the st_intersect test and compute the
>st_polygon ONLY if the raster ContainsNoDataValue. We have only two free
>slots for flags like this remaining, but at the same time they are there to
>fix this kind of cache problem.
>
>4) We could redefine (since it has not been really defined yet)
>HasNodataValue to "the stored nodata value is a true nodata value AND the
>raster contains at least one of them. That means some tiles would have the
>flag set and some others not. After all, what is the usefulness of knowing
>that the nodata value is a true nodata value if there are none in the
>raster?... In this case I would have just to change gdal2wktraster.py and
>ST_SetBandHasNoDataValue() (and maybe add a trigger to
>ST_SetBandNodataValue()) so they set the flag of each tile properly.
>
>I hesitate between 3 and 4... 4 is minimal but maybe too minimal and make us
>lose some important information. 3 seems to give too much importance to the
>nodata value problem.
>
>Pierre
>
>_______________________________________________
>postgis-devel mailing list
>postgis-devel at postgis.refractions.net
>http://postgis.refractions.net/mailman/listinfo/postgis-devel
>
>
>_______________________________________________
>postgis-devel mailing list
>postgis-devel at postgis.refractions.net
>http://postgis.refractions.net/mailman/listinfo/postgis-devel