[Qgis-developer] QEP/RFC sqlite virtual tables

Hugo Mercier hugo.mercier at oslandia.com
Tue Nov 4 05:36:40 PST 2014


Hi,

Some additional information, after some more digging into sqlite code /
documentation / mailing lists:

* The parsing part of SQLite is private and cannot be requested by an
external program. Moreover, the parse phase directly builds a program,
not an abstract syntax tree. However, a formal grammar is available
(http://www.sqlite.org/syntaxdiagrams.html) and the SQLite license is
very permissive. So, reusing parts of the SQLite parser seems not that
hard and would allow to parse a subset of SQL (only SELECT I guess)

* About indexes on virtual tables, contrary to what I wrote previously,
the xBestIndex() method of virtual tables should be enough to orient the
planner, an estimated cost and estimated number of rows can be returned
for each part of the where clause. So there should be no need to copy
native indexes.
But the provider interface should be extended in order to provide such
statistics.

Now for the more concrete part: we may have a funding for that
development, but a decision has to be taken before this funding vanishes.

So in order to advance on that topic, what we can propose is to develop
a fully functional prototype, as a transition before being added to the
core. It will implement a new data provider, with these user-oriented
features :
* an integration into the DBManager, in order to be able to use SQL on
any layer
* the ability to use a WHERE filter on layers that have a join.

For the code part, a SELECT parser will be proposed, based on SQLite if
possible and performance tests will be developed with different use
cases, allowing to test for the planner performances as well as for the
"data copying" part.

We are willing to develop this as a plugin if it can be included in QGIS
as a c++ plugin. Is there any objection to this ?


Le 30/10/2014 10:32, Hugo Mercier a écrit :
> Hi Martin,
> 
> Thanks for your review.
> 
> Le 29/10/2014 17:34, Martin Dobias a écrit :
>> Hi Hugo!
>>
>> On Tue, Oct 28, 2014 at 6:15 PM, Hugo Mercier <hugo.mercier at oslandia.com> wrote:
>>> are there other opinions ?
>>> Other arguments for one or the other side ?
>>
>>
>> I have to say that initially I was very excited about the idea... and
>> after thinking about the details I am getting less excited as there
>> are probably lots of gotchas. Below there is list of questions in the
>> order of reading the proposal.
>>
>> ** Use cases
>>
>> - item 1 - in what sense the layer would be "dynamic"?
> 
> "create a dynamic point layer based on X,Y(,Z) coordinates" => if points
> coordinates in the underlying layer change, then their geometries can be
> updated during the next fetch.
> Like for example the creation of a point layer from a CSV file with the
> "watch file" option.
> 
>>
>> ** Indexes
>>
>> - how will be indexes requested by underlying providers of primary
>> sources? Will you add support to providers to return indices
>> associated with a given primary source? How can sqlite make use of
>> such information? Are you going to replicate the same indices for the
>> virtual tables?
> 
> Yes we will need something like this in order to properly use indexes of
> underlying layers. By populating sqlite_stat* tables.
> And yes I realize that may involve replication of indexes.
> 
> For the case where more than one index is available on a table, the
> xBestIndex method of the virtual table mechanism is made for that: an
> estimated cost can be returned for each constraint
> 
> 
>>
>> - the need to use spatial indices explicitly (with such weird syntax)
>> is not good - this should be done by DBMS, not by the user. In
>> practice nobody will use them and the spatial queries will be slow
> 
> Yes it is the Spatialite spatial index syntax.
> 
>>
>> ** Serialization
>>
>> - I am lost. Are we going to materialize those sqlite views in memory
>> or to disk? I thought they were meant to be just views wrapping QGIS
>> layers
> 
> Yes, they are. They can be part of a project file. The second paragraph
> is misleading.
> 
>>
>> ** Implementation details
>>
>> - regarding QgsVectorLayer constructor: to me it does not look like a
>> case special enough to require a new parameter
>>
>> - what does it mean the provider would be based on spatialite? a
>> subclass of spatialite provider - or a copy of it - or something else?
> 
> Not decided yet. But it is probably too early in the discussion to go
> into such details.
> 
>>
>> - are the insert/update/delete triggers going to be provided by user
>> or generated automatically? (I believe it should be automatic)
> 
> As automatic as possible. But in the general case, it is not possible.
> Some columns cannot be updated (computed during the query). Or there may
> be more than one possibility to "update" a join.
> 
>>
>> - how do we find out that the views from virtual layers are actually editable?
> 
> If an update trigger is defined.
> 
>>
>> - how do we determine whether the view has a geometry - and in case of
>> multiple geometries, which one to choose? (geometry columns are just
>> ordinary blobs, right?)
>>
>> - how do we determine primary key for a view? (for feature IDs)
> 
> Good points. We will need to pass these information to the constructor,
> as it is now with a postgres provider
> 
>>
>> ** Python bindings
>>
>> - how would I create a virtual layer from Python? I would hope to do
>> something like iface.addVectorLayer("SELECT * FROM layerX", "virtual")
>> - would that be possible?
> 
> This kind of syntax (using loaded layer names instead of their explicit
> definition) is clean but would require :
> * that layers are already loaded before we can use them
> * then to save the "loading order" of layers in the project file
> 
> The document refers to the use of explicit layer urls.
> I don't have a strong opinion on this choice.
> 
>>
>> ** User interface
>>
>> - what is the "referenced layers" box for - is it just to inform user
>> which layers he used? Or does he need to explicitly register those
>> "primary" layers somewhere?
> 
> See previous answer
> 
>>
>> ** Performance
>>
>> - how will sqlite plan execution of queries?
>>   - inner join of two virtual tables based on attributes?
>>   - spatial joins?
>> - does sqlite have access to any statistics of virtual tables and
>> indices? These are quite important for reasonable query execution
>> plans.
> 
> See answer more on the top: sqlite_stat* tables, xBestIndex
> 
>>
>> - During normal operation when iterating over vector layer there is
>> conversion: backend -> QgsFeature. As far as I understand the virtual
>> tables add two more conversions during execution: QgsFeature -> sqlite
>> -> QgsFeature (first in virtual table implementation, second in new
>> provider implementation). Is that correct? Have you tried to measure
>> additional overhead?
> 
> Hmmm correct. I did not have any measure yet ...
> 
>>
>>
>> In summary:
>>
>> I have the impression that virtual tables are just a simple way to
>> provide access to external data to sqlite, not an invitation to use
>> sqlite engine with "foreign" data. The provided API simply does not
>> look mature enough. I would not be too optimistic about openness of
>> sqlite project - the developer list is not publicly available and I
>> haven't found a link to repository either (except for the commit log)
>> - but I may be wrong.
> 
> http://osdir.com/ml/sqlite-dev/
> http://www.sqlite.org/cgi/src/tree?ci=trunk
> 
>>
>> Aren't there any other embedded SQL engines that we could use?
> 
> You mean another powerful, embeddable, open source, spatial-enabled, SQL
> engine ?
> I don't see :)
> 
>>
>> Maybe we should have a simple SQL 'SELECT' statement parser that would
>> parse the queries and then decide what to do with them:
>> - execute on backend (if all tables are from the same DB)
>> - execute with existing QGIS functionality (if we have efficient
>> algorithm for that)
>> - execute with sqlite engine (with materialized views)
> 
> Ok, after thinking about issues Matthias and you raised, inclusion of
> the syntax parsing part of (a subset of) SQL in QGIS may be needed,
> especially to generate spatial index queries ...
> 
> I still think we should try to avoid having both your item 2 and 3. We
> better have to choose between executing with QGIS or with something else.
> 
> 
> _______________________________________________
> Qgis-developer mailing list
> Qgis-developer at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/qgis-developer
> 



More information about the Qgis-developer mailing list