[Qgis-developer] have aggregate/window expressions ever been discussed?

Matthias Kuhn matthias.kuhn at gmx.ch
Wed May 28 09:30:30 PDT 2014


On 05/28/2014 10:30 AM, Hugo Mercier wrote:
> Le 28/05/2014 08:35, Matthias Kuhn a écrit :
>> Hi All,
>>
>> As the responsible person for QGIS relations, I feel obliged to share my
>> thoughts in this discussion.
>>
>> First of all a few notes about QGIS expressions and their current state:
>> QGIS expressions are a nice and handy functionality to quickly calculate
>> values based on a single feature in QGIS. This serves a lot of use-cases
>> quick and easy. However, in their current state, they don't allow to do
>> exactly what this thread originates from: aggregate and join data from
>> other layers or use subqueries.
> Yes exactly. We have more or less the WHERE clause with expressions.
> Aggregates, joins and subqueries could be built using something else
> (relations?)
Well relations would need to be integrated into some form of syntax 
first. And IMO this should be integrated nicely into the QgsExpression 
system to avoid having yet another candidate for this thread. The output 
of a relational query should be accepted as a node in a QgsExpression, 
and it should be able to insert a QgsExpression as part of this a 
relational query. So I think the best way is to extend expressions to 
support iterators (and maybe single features?) as return values. But I 
know that there are other opinions out there ;)

>
>> I like the idea of using sqlites virtual tables to have immediate access
>> to a huge base of functionality offered by sqlite to do complex queries.
>> If we introduce this support, we have immediate support for a wide range
>> of database functionality with a few lines of code. The only thing I am
>> not sure (and I don't know the virtual tables implementation enough to
>> answer this question) is if it is able to delegate cross-table queries
>> to the original database. In short: can I do a request that requires
>> data from different tables of the same database and have it executed
>> directly inside the database? My suspicion is no, because the sqlite
>> virtual tables will be known to sqlite as QGIS tables and it will still
>> query the tables through the QGIS provider, therefore calculating e.g. a
>> max functionality by querying the QGIS provider for all features and
>> then calculating the "max" locally and not on the server side. This
>> would be a major performance impact for customers having a single
>> database that could do this calculation for us instead of doing this
>> ourselves.
>>
>> Instead, if we have QGIS expressions (or a QGIS query implementation on
>> top of it) support for this, we are able to tell, if different tables
>> are from the same database and therefore if we are able to delegate the
>> whole join/aggregate/subquery whatever job to the database and let the
>> database do what it's good at.
>>
>> Therefore my question to the folks who know the sqlite virtual table
>> code: is it possible to have sqlite virtual tables forward cross table
>> queries to the database itself? Or is it possible to get access to the
>> parsed query tree (or whatever the name of that may be) to determine
>> based on QGIS side based on the parsed query if we are able to optimize
>> by forwarding to the database.
>>
> AFAIK, you do not have direct access to the query sent to a virtual
> table. But the SQLite engine will ask your virtual table driver what is
> the best way to resolve a constraint on columns.
> For instance, if the original query has a WHERE 'a = 2', then the driver
> will be asked if it knows how to quickly resolve this, using indexes
> (have a look at xBestIndex at http://www.sqlite.org/vtab.html)
> So I guess, you could use the remote database indexes.
> But, yes that would still be not very efficient if you want to query two
> tables of the same postgis database.
>
> In my mind, SQLite virtual tables are interesting for offering a more or
> less relational view on data sources that are not originally designed
> for that. But it would be suboptimal for already powerful databases.
>
> We then have two kinds of "database backends" : real databases (postgis
> and ... what else ?), and pseudo-databases through SQLite virtual tables.
>
> And we would have a conversion from QgsExpression + relations to
> different dialects of SQL (SQLite / PostgreSQL / ...) in each provider ?
> Does it make sense for you ?
It does make sense.

Real Databases: PostgreSQL, Oracle, MSSQL, (sqlite, native SQL through OGR)

What would you like to use virtual tables drivers for?
The idea of having it available as another provider for layer 
definitions sounds good to me anyway.
But should it also be possible to use sqlite syntax wherever we have 
expression syntax integrated now?
Then the UI would be "Field / Expression / SQLite"?
In terms of control over what gets executed where and to translate it to 
native SQL wherever possible.
In terms of quickly available functionality, sqlite virtual tables are 
most likely an easy shot. Is there a possibility to overcome the problem 
of "control"? Would be nice if yes, but I don't know the people behind 
sqlite.

Is it worth adding two different syntaxes that may and will confuse 
users for their advantages? In terms of usability I doubt that it is a 
good idea (Apple would immediately fire any CTO who proposes having this 
redundancy, but then, they have tons more development power than we have 
:-) )

>
>> Concerning joins, merging this code with the relations seems a viable
>> option for me in the long run. Currently, relations do no caching and
>> do not "hard join" the data on the other table (meaning, joined fields
>> are not available as fields) and maybe there are other things missing
>> (I remember a thread here on the ML about this before)
>>
> Yes in my opinion, we should try to have the same code for what we call
> "joins" and more general relations. And on the GUI part, some
> redundancies between these two concepts may have to be reduced as well.
Sure thing. My idea is to allow some more control in the relations 
window. For this purpose, I think a checkbox to "hard join" some fields 
(doesn't need to be all, does it?) should do the trick I think.

Then we can also replace the fields properties "Join" dialog with a 
"Relation" dialog that gives some control over relations that include 
the current layer, so a user doesn't always have to resort to the 
project properties.


More information about the Qgis-developer mailing list