[gdal-dev] Call for discussion on RFC 63 : Sparse datasets improvements
Even Rouault
even.rouault at spatialys.com
Sat Jul 9 02:38:07 PDT 2016
Le samedi 09 juillet 2016 10:35:21, James Ramm a écrit :
> Definitely a welcome improvement!
> Especially useful where the read window does not match natural block size
> (so the current sparse functionality doesnt help much). It improved the
> efficiency of a blob extraction algorithm (which requires overlapping
> windows and thus cannot use natural blocks) we use by around 20-45%
> depending on our data.
>
> You are able to find the percentage of non null values without reading data
> is that correct? I don't fully understand how it works yet,
For GeoTIFF, the unit of query remains the block. So if a block is missing is
present, it computes how many pixels of the window of request intersect the
block, and count them as present.
Imagine that you have a raster of dimensions 20x20, with tiles of size 10x10.
Imagine the first block (pixels 0,0 to 9,9) is missing. And you call
GetDataCoverageStatus() with (5,5,15,15). Then you will have a contribution of
10x5 valid pixels from the top-right block, 5x10 from the bottom-left block
and 10x10 from the bottom-right block, hence a non-null percentage of 200. /
225. * 100 = 88.9 %
For VRT, the computation is done by starting with a polygon with the shape of
the query window, and each time a source intersects that polygon, by removing
the contribution of this source with a Difference() geometry operation. At the
end, you compute the area of the resulting polygon.
> but would it be
> possible to retrieve the indices of non null data?
You can do that by calling GetDataCoverageStatus() in iterative way, with a
window size not too small (block size e.g.).
For VRT such a list couldn't be established since the shape of holes can be
completely arbitrary depending on the relative location of sources.
> In the case where you data is very sparse but each block still contains a
> small number of pixels, you would still need to loop through all that null
> data.
You mean that in an extreme situation you could have datasets where all blocks
would be not completely null, but have just one valid pixel for example ? In
that case this RFC wouldn't help at all as those blocks would be seen as data.
You really need to fetch the data to know which pixels are null or not.
> If it were possible to retrieve the non null indices that could be
> useful, at least to speed up python apps...
>
> On 8 Jul 2016 4:27 p.m., "Even Rouault" <even.rouault at spatialys.com> wrote:
> > Hi,
> >
> > The topic of sparse dataset management come back regularly, so I've
> > decided to
> > tackle it.
> >
> > Please find
> > https://trac.osgeo.org/gdal/wiki/rfc63_sparse_datasets_improvements
> > for review.
> >
> > Even
> >
> > --
> > Spatialys - Geospatial professional services
> > http://www.spatialys.com
> > _______________________________________________
> > gdal-dev mailing list
> > gdal-dev at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/gdal-dev
--
Spatialys - Geospatial professional services
http://www.spatialys.com
More information about the gdal-dev
mailing list