[gdal-dev] Call for discussion on RFC 63 : Sparse datasets improvements

Sat Jul 9 02:38:07 PDT 2016

Le samedi 09 juillet 2016 10:35:21, James Ramm a écrit :
> Definitely a welcome improvement!
> Especially useful where the read window does not match natural block size
> (so the current sparse functionality doesnt help much). It improved the
> efficiency of a blob extraction algorithm (which requires overlapping
> windows and thus cannot use natural blocks) we use by around 20-45%
> depending on our data.
> 
> You are able to find the percentage of non null values without reading data
> is that correct? I don't fully understand how it works yet,

For GeoTIFF, the unit of query remains the block. So if a block is missing is 
present, it computes how many pixels of the window of request intersect the 
block, and count them as present.

Imagine that you have a raster of dimensions 20x20, with tiles of size 10x10. 
Imagine the first block (pixels 0,0 to 9,9) is missing. And you call 
GetDataCoverageStatus() with (5,5,15,15). Then you will have a contribution of 
10x5 valid pixels from the top-right block, 5x10 from the bottom-left block 
and 10x10 from the bottom-right block, hence a non-null percentage of 200. / 
225. * 100 = 88.9 %

For VRT, the computation is done by starting with a polygon with the shape of 
the query window, and each time a source intersects that polygon, by removing 
the contribution of this source with a Difference() geometry operation. At the 
end, you compute the area of the resulting polygon.

> but would it be
> possible to retrieve the indices of non null data?

You can do that by calling GetDataCoverageStatus() in iterative way, with a 
window size not too small (block size e.g.). 
For VRT such a list couldn't be established since the shape of holes can be 
completely arbitrary depending on the relative location of sources.

> In the case where you data is very sparse but each block still contains a
> small number of pixels, you would still need to loop through all that null
> data. 
You mean that in an extreme situation you could have datasets where all blocks 
would be not completely null, but have just one valid pixel for example ? In 
that case this RFC wouldn't help at all as those blocks would be seen as data. 
You really need to fetch the data to know which pixels are null or not.

> If it were possible to retrieve the non null indices that could be
> useful, at least to speed up python apps...
> 
> On 8 Jul 2016 4:27 p.m., "Even Rouault" <even.rouault at spatialys.com> wrote:
> > Hi,
> > 
> > The topic of sparse dataset management come back regularly, so I've
> > decided to
> > tackle it.
> > 
> > Please find
> > https://trac.osgeo.org/gdal/wiki/rfc63_sparse_datasets_improvements
> > for review.
> > 
> > Even
> > 
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
Spatialys - Geospatial professional services
http://www.spatialys.com