[Qgis-developer] QEP/RFC sqlite virtual tables

Hugo Mercier hugo.mercier at oslandia.com
Wed Oct 22 05:11:37 PDT 2014


Le 22/10/2014 11:21, Matthias Kuhn a écrit :
> On 22.10.2014 10:38, Hugo Mercier wrote:
>> Well, QgsExpression has been designed to mimick the SQL WHERE clause, so
>> yes I agree.
>> My point is that if you want to extend expression to support joins,
>> aggregates, window functions, etc. you end up recreating SQL, but
>> probably a less powerful one.
> You can call it mimicking, you can also call it implementing it's own
> SQL (subset) dialect.

Sorry for my english, "mimicking" was not a negative term for me.

> Yes, you end up recreating SQL. How powerful it will be is hard to say
> right now, right?

Sure. But if you want it to be as powerful as other SQL engines, lots of
work is needed (parsing, planning, optimizing). Work that has already
been done, I think.

>>
>> So for joins, since the information comes from QGIS ("I want to join
>> tables t1 and t2 on this column predicate"), the SQL query can be
>> deduced and some other metadata is also known, like the fact that t1 and
>> t2 may be from the very same database.
>> In that case the virtual layer may be created with some metadata saying
>> that this is in fact a join that must be optimized. And then the virtual
>> layer provider can ask the underlying provider (say postgis) if it can
>> process the whole join query.
> 
> I am not only referring to the current relations implementation here. I
> don't think aggregate functions need necessarily be based on creating a
> relation definition in the project file (Just like in a database you
> also don't need a foreign key constraint to create join).

Not sure to understand. You mean there would be no need to create a
virtual layer just to execute an aggregate ? Exact.
But if the virtual layer concept is complete enough, then every single
layer can be seen as a special case of a (trivial) virtual layer, then
to create an aggregate, you won't need to create a new virtual layer.

> 
> But if I want to create a new layer or an ad-hoc iterator that joins
> data or makes use of other advanced relational database features, it
> would be nice if QGIS could forward such requests to the database. Say I
> want to create a symbology based on the amount of waste within
> countries, QGIS should be able to execute this request on the database.
> In this scenario you could likely want to define this request inside an
> expression (e.g. a virtual column).
> So basically, I think we need a functionality that is able to work as a
> database abstraction layer. Forward wherever possible to the db, use a
> (local) fallback implementation where not.

I agree, but can you be more specific on your example ? How many tables
involved ? Which fields ? What kind of joins ?

> 
> I am not opposed to use SQLite for this. There is no reason for us to
> reinvent the wheel, the NIH syndrome should not be our driver. But I see
> some limitations and I would rather prefer to have these properly
> addressed. And I will be more than happy if you tell me that you
> convinced the sqlite virtual table developers that they should extend
> their API to expose the parsed SQL structure and allow to tweak it. Then
> we have a fallback implementation which is well-tested but we still have
> the possibility to fine-tune requests for big databases.

Splitting the original query in local and remote queries seems to me
very hard in the general case. It would require not only the parsed SQL
AST, but some information about the execution plan. And the plan may
depends on statistics on the data. And even when you know the plan, I
think it is still hard to translate.

Generally speaking if the user tries to do a very complex query with
lots of different data sources, it is hard to guarantee good
performances, but I don't see it as a major problem. If you want
something complex AND fast, then use SQLite or PostGIS.

So, I would vote for optimizations that can be done for simple, defined
scenarios, like the "join" case given by the current join or relation
feature of QGIS : no nested queries, no group by, no order by, no CTE,
etc. That particular case could be detected when the join is created,
and optimizations can be created, like creating a (temporary) postgis
layer with sql=the join query.

I may be missing something. Are there other simple cases where the
optimization is well defined ?

To put it differently, I don't see the lack of knowledge about parsed
SQL query as a blocker for the implementation of this virtual layer
concept. A general cross-database SQL optimizer would need lots of work
(in QGIS and/or SQLite) and may come later (if really needed).

The other option would be to say we don't need everything SQLite
provides, and we won't never need it. So then we can define our own
subset of SQL and it would probably be easier to optimize. But I can't
see why we would want to limit the features :) it is usually the opposite.



More information about the Qgis-developer mailing list