[gdal-dev] gdalinfo -mm also report n (number of grid cells that are not nodata)

Markus Metz markus.metz.giswork at gmail.com
Sun Jun 17 10:22:10 PDT 2018


On Sat, Jun 16, 2018 at 10:00 PM, Even Rouault <even.rouault at spatialys.com>
wrote:
>
> >
> > I checked, results with gdalinfo -stats are wrong because existing
> > STATISTICS_* metadata are reported even if approximate statistics are
not
> > allowed.
>
> No, if STATISTICS_APPROXIMATE=YES and is set in .aux.xml (because initial
> computation was done with -approx_stats) and you do gdalinfo -stats after,
> then statistics will be recomputed on all samples and
> STATISTICS_APPROXIMATE=YES  will be cleared

IMHO, metadata can not be trusted because it is (often) not known which
software generated these metadata with which method. The only safe
assumption is that statistics in metadata can be approximations. Therefore
statistics from metadata should only be reported if approximations are ok,
no matter if STATISTICS_APPROXIMATE=YES exists or not.

>
> > The problem is, STATISTICS_APPROXIMATE is not set. Other software
> > using GDAL to create raster datasets may use
> > GDALRasterBand::SetStatistics() which does not indicate if stats are
> > approximations., i.e. stats are approximations but there is no
> > STATISTICS_APPROXIMATE=YES.
>
> The idea is that if you use GDALRasterBand::SetStatistics()  then you are
> assumed to provide exact statistics. If they are only approximate, then
you
> should also set STATISTICS_APPROXIMATE=YES with GDALSetMetadataItem()

What about 1) all the datasets that have already been created, 2) all the
third-party software packages providing some sort of statistics in metadata?

IMHO, the STATISTICS_APPROXIMATE=YES mechanism does not work because of 1)
and 2).
>
> >
> > GDAL assumes that STATISTICS_* metadata represent stats on all pixels,
this
> > is IMHO wrong. You can only hope that STATISTICS_* metadata represent
stats
> > on all pixels if a respective metadata item has been set to boolean
true,
> > something like STATISTICS_ALL_PIXELS=YES.
>
> I'm really confused. Why introducing yet another item whereas
> STATISTICS_APPROXIMATE=YES is there for that purpose ?

Simpler: statistics in metadata can be approximations, also if
STATISTICS_APPROXIMATE=YES is not set. If exact statistics are requested,
scan all pixels. A new metadata item like STATISTICS_APPROXIMATE or
STATISTICS_ALL_PIXELS is not needed.

>
> > Even in this case, an option to
> > force recomputing raster band stats would be very nice to have
(verifying
> > metadata).
>
> ComputeStatistics() will recompute statistics. It is true that with
gdalinfo -
> stats, they are not recomputed if they already exist and were not
approximate
> since it calls GetStatistics() and not ComputeStatistics(). An easy
workaround
> is to delete the .aux.xml to force recomputation.

At least recomputation of min/max can already be forced with gdalinfo -mm.

It would be nice if gdalinfo -stats would also trigger forced recomputation
of exact statistics. Currently there is no difference between gdalinfo with
and without -stats if statistics already exist in metadata and
STATISTICS_APPROXIMATE=YES is absent (standard case for existing data):
-stats has no effect here. Forced recomputation would be a change in the
behaviour of gdalinfo -stats.

The purpose of gdalinfo -approx-stats would be (already is?) to quickly get
stats for a raster band, either from metadata or by approximation.

Trying to avoid another option like gdalinfo -exact-stats.

Markus
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20180617/45a950b2/attachment.html>


More information about the gdal-dev mailing list