[gdal-dev] OGR SQL, COUNT([DISTINCT] xx) and NULL values

Even Rouault even.rouault at mines-paris.org
Fri Nov 25 17:14:31 EST 2011


Hi,

http://trac.osgeo.org/gdal/ticket/4354 :

"""Currently, OGR SQL considers COUNT(a_column) as a synonym of COUNT(*). 
However the standard mandates that COUNT(a_column) only count non-NULL values, 
whereas COUNT(*) count all rows (after the effect of WHERE clause of course). 
So COUNT(a_column) should not be considered as equivalent to COUNT(*) 
The same applies for COUNT(DISTINCT a_column): it should only count unique 
non-NULL values. 
What is a bit annoying here is that ogr_sql.html says that : "As a special 
case, the COUNT() operator can be given a "*" argument instead of a field name 
which is a short form for count all the records though it would get the same 
result as giving it any of the column names [...]" 
If we fix the current behaviour to be conformant with the standard, it can 
break assumptions made by users. But currently, there's an inconsistency if 
you look at the result of -sql "select sum(foo), count(foo), avg(foo) from 
bar" when there are null values.
"""
Perhaps only fix that in trunk (the current behaviour existed in 1.7 and likely 
previous versions, not only in 1.8) ? and in the doc, add a note about the 
difference between new and old behaviour ?


More information about the gdal-dev mailing list