[gdal-dev] Histograms without null pixels

Caleb Hanger cdhanger at gmail.com
Tue Mar 5 07:30:19 PST 2013

On Tue, Mar 05, 2013 at 01:31:01PM +0530, Chaitanya kumar CH wrote:
> On Tue, Mar 5, 2013 at 11:53 AM, Caleb Hanger <cdhanger at gmail.com> wrote:
> > What do you mean?  The histogram is only precomputed if gdalinfo has been
> > run previously and stored a cache of data (the XML file) for later
> > reference; please correct me if I'm wrong.  Otherwise, it seems to me that
> > the data *must* be analyzed and the data computed; there is no way around
> > that.  Additionally, I don't think it makes sense to say that it is faster
> > to assemble a histogram that includes out-of-range values than to assemble
> > a histogram that does not, because the latter is a subset contained within
> > the former.
> >
> Some raster formats can store the histogram data as metadata. Also, there
> are is a shortcut; overviews can be used to get approximate values faster.

Understood, thanks for pointing that out.  In that case, however, the responsibility to decide whether to include out-of-range values lies in the raster format, correct?

> You can write a simple python script to get the histogram. You can get a
> good idea at
> http://trac.osgeo.org/gdal/browser/trunk/autotest/gcore/histogram.py#L103

Right, of course I can construct my own utility to include the GDAL libraries and call GetHistogram in the manner I'd like, and I will probably end up doing so, but in C.  My main goal at the moment is a bash shell script that accomplishes a bigger task, so with a Python script I'd still have to call out to the Python script externally.  Either language would be suited well for the smaller purpose of getting and analyzing the histogram.  I merely found it odd that gdalinfo does not allow the user to disable an option that just doesn't seem to make sense for most applications.

> > Yes: quite simply, a histogram that does not include illegitimate values
> > makes more sense than a histogram that does include them.  At least, this
> > is the case in my experience; perhaps there are situations I'm unfamiliar
> > with in which the histogram is desired to include values outside of the
> > histogram's range, for some reason.  I'd even go so far as to say that
> > gdalinfo currently *lies* about the histogram, telling the user that the
> > histogram shows "256 buckets from X to Y" even though the histogram
> > includes values which are *outside of that range*.
> >
> > Arguably an even better justification is that the machinery to exclude the
> > illegitimate values is already present in GetHistogram, so *very* few lines
> > of code would need to be added to gdalinfo, simply to provide a flag for
> > the user and if that flag is specified, pass "false" for the right
> > parameter to GetHistogram (bIncludeOutOfRange), instead of just calling
> > GetDefaultHistogram.
> >
> That sounds reasonable. If you can, you should submit a patch or a request
> at http://trac.osgeo.org/gdal/newticket
> Note that the functionality should include the facility to include options
> to mention the min/max values.

Thanks; I will post a patch when I have a chance.  When requesting a histogram in gdalinfo, the min/max values are already mentioned, and this will just be an addition to, and modification of, that functionality.

More information about the gdal-dev mailing list