[gdal-dev] OGR SQL, COUNT([DISTINCT] xx) and NULL values
even.rouault at mines-paris.org
Fri Nov 25 17:14:31 EST 2011
"""Currently, OGR SQL considers COUNT(a_column) as a synonym of COUNT(*).
However the standard mandates that COUNT(a_column) only count non-NULL values,
whereas COUNT(*) count all rows (after the effect of WHERE clause of course).
So COUNT(a_column) should not be considered as equivalent to COUNT(*)
The same applies for COUNT(DISTINCT a_column): it should only count unique
What is a bit annoying here is that ogr_sql.html says that : "As a special
case, the COUNT() operator can be given a "*" argument instead of a field name
which is a short form for count all the records though it would get the same
result as giving it any of the column names [...]"
If we fix the current behaviour to be conformant with the standard, it can
break assumptions made by users. But currently, there's an inconsistency if
you look at the result of -sql "select sum(foo), count(foo), avg(foo) from
bar" when there are null values.
Perhaps only fix that in trunk (the current behaviour existed in 1.7 and likely
previous versions, not only in 1.8) ? and in the doc, add a note about the
difference between new and old behaviour ?
More information about the gdal-dev