[postgis-devel] Speeding up ST_Intersects(raster, geometry)

Paragon Corporation lr at pcorp.us
Tue May 11 15:10:08 PDT 2010


Pierre,

Leo and I thought about this a bit.

You don't really need to polygonize the  raster to determine intersects,
though you probably will need to for intersection.

You just need to find one pixel that intersects the geometry and declare
victory.

I would think it would be easy enough to figure out the the row and column
ranges you need to spot check based on the convex hull alone.

Then when you find one pixel in that range that is not nodatavalue  -- you
return true.

Though maybe we are misunderstanding.

Hope that helps,
Regina and Leo

 

-----Original Message-----
From: postgis-devel-bounces at postgis.refractions.net
[mailto:postgis-devel-bounces at postgis.refractions.net] On Behalf Of Pierre
Racine
Sent: Tuesday, May 11, 2010 5:07 PM
To: postgis-devel at postgis.refractions.net
Subject: [postgis-devel] Speeding up ST_Intersects(raster, geometry)

Regina, Jorge, Mat,

I'm writing ST_Intersects(raster, geometry) and ST_Intersection(raster,
geometry). ST_Intersects(raster, geometry) should be defined like this:

CREATE OR REPLACE FUNCTION ST_Intersects(geometry, raster, integer) RETURNS
boolean AS $$
    SELECT $1 && $2 AND st_intersects($1,st_convexhull($2)) AND
        (NOT st_bandhasnodatavalue($2, $3)) OR st_intersects($1,
st_polygon($2, $3)))
        $$ LANGUAGE 'SQL' IMMUTABLE;

Hence, a geometry intersects with a raster if:

1) their bounding boxes intersect (quick)

AND

2) the rotated box of the raster and the geometry intersect (quick)

AND

3) there is not a nodata value defined for the band (quick) or the geometry
intersects intersects with the withvalue area (slow)

The problem goes like this: if there is a nodatavalue defined for the whole
raster and the geometry intersects with the bounding box of a tile (and the
rotated box) we must compute the intersect between the geometry and the
polygonization (st_polygon) of the withvalue area in order to make sure the
geometry significantly intersect with the tile. However, up to now,
st_polygon is very slow because it is the st_union of all the polygonized
part of the tile. This makes ANY intersect test between a raster having a
significant nodata value and a geometry intersecting its bounding box (or
its rotated box) way too slow for nothing since even if there is no pixel
with nodata value, we have to polygonize all the tile anyway to end up with
a geometry identical to the envelope (which is itself VERY fast to compute).
Here are some solutions to this problem. Let me know your feelings:

1) We could make st_polygon faster by implementing a new smarter polygonize
function in GDAL that would only return the polygon associated with a
specific value. In this case we could polygonize only the nodata value and
make the test with the difference between the envelope and this nodata
geometry (the result is the withvalue area). Unformtunately this would still
be way too slow. I'm not even sure this "smarter" polygonize function could
be faster than the existing one since it would probably have to parse every
pixel anyway. I don't know if constructing only one geometry is much faster
than constructing all the geometries resulting from the polygonization.
Probably not much enough anyway.

2) We could generally just always IGNORE the nodata values, threat them as
real data values and let the responsability to reject a test or a resulting
geometry to the application. (I don't really like this idea...)

3) We could add a ContainsNoDataValue to each tiles in addition to the
HasNodataValue one. (In this case HasNodataValue lose a bit of its meaning
and we should probably rename it TrueNoDataValue.) TrueNoDataValue would
mean "the stored nodata value is a true nodata value". ContainsNoDataValue
would mean "there are some pixels with nodata value in this raster or tile.
In this case we can use this flag in the st_intersect test and compute the
st_polygon ONLY if the raster ContainsNoDataValue. We have only two free
slots for flags like this remaining, but at the same time they are there to
fix this kind of cache problem.

4) We could redefine (since it has not been really defined yet)
HasNodataValue to "the stored nodata value is a true nodata value AND the
raster contains at least one of them. That means some tiles would have the
flag set and some others not. After all, what is the usefulness of knowing
that the nodata value is a true nodata value if there are none in the
raster?... In this case I would have just to change gdal2wktraster.py and
ST_SetBandHasNoDataValue() (and maybe add a trigger to
ST_SetBandNodataValue()) so they set the flag of each tile properly.

I hesitate between 3 and 4... 4 is minimal but maybe too minimal and make us
lose some important information. 3 seems to give too much importance to the
nodata value problem.

Pierre

_______________________________________________
postgis-devel mailing list
postgis-devel at postgis.refractions.net
http://postgis.refractions.net/mailman/listinfo/postgis-devel





More information about the postgis-devel mailing list