[Qgis-developer] Aggregates within expression engine
nyall.dawson at gmail.com
Tue Mar 15 23:17:20 PDT 2016
This has been discussed a few times in the past, but a possible
funding opportunity has arisen and I'd like to get things moving on
implementing this for 2.16.
Before I start a QEP I thought an informal discussion concerning the
approach is a good idea.
My thoughts are:
- Calculation of aggregates is done when a QgsExpression is first
evaluated (and the aggregate is required). The calculated aggregate
will then be cached within the QgsExpressionContext the expression has
been evaluated against. This means that additional evaluation of the
expression will not need to suffer the cost of recalculating the
aggregate multiple times (eg using the field calculator to calculate a
field value / sum that fields values in the layer = sum is only
calculated once). This approach will also avoid the possibility of the
aggregate's value changing mid-way through updating multiple features
(eg using field calculator to change a field's value to "field"/sum).
- Create a new class QgsAggregateCalculator with methods for
calculating aggregates from a QgsFeatureIterator and field name. This
should make it straightforward to calculate aggregates for either an
entire layer, selected features, or using related features.
- For numeric fields the existing QgsStatisticalSummary class will be
used to calculate the statistic. For string fields a new
QgsStringStatistics class will be created which will operate on
QStringLists and calculate statistics such as the max/min and
concatenations of these strings. Similarly a QgsDateTimeStatistics
class will be created for date/time statistic calculations. (Note that
these classes could also be reused in the statistics panel for showing
stats for string/date/time classes).
Alternative wild ideas:
- Add handling of arrays to the expression engine, and calculate the
aggregates from arrays, and then add methods to the expression engine
which iterate through a layer and returns an array of values?
I'm also torn regarding the best syntax to use for aggregates within
expressions. I'm unsure if the traditional SQL "group by" clauses
would be a good fit within the existing QGIS expression syntax (eg
"sum("some_field") group by "some_other_field"). To me it doesn't fit
with the existing functional approach that the expressions take. But
on the other hand, trying to implement this as functions would result
in some very clumsy expressions: "aggregate('sum', "some_field",
"some_other_field")" or "sum("some_field", "some_other_field") ". Has
anyone got any other ideas for syntax which would be a good fit?
Given how useful these will be, and the wide variety of possible uses
they will have, I'd like to gather everyone's thoughts on how
aggregates should be added to QGIS and all the potential use cases
they'd like me to address with this work.
So, thoughts and feedback welcome, no matter how off-the-cuff they are! ;)
More information about the Qgis-developer