[Qgis-developer] QEP/RFC sqlite virtual tables

Hugo Mercier hugo.mercier at oslandia.com
Thu Oct 30 02:32:53 PDT 2014


Hi Martin,

Thanks for your review.

Le 29/10/2014 17:34, Martin Dobias a écrit :
> Hi Hugo!
> 
> On Tue, Oct 28, 2014 at 6:15 PM, Hugo Mercier <hugo.mercier at oslandia.com> wrote:
>> are there other opinions ?
>> Other arguments for one or the other side ?
> 
> 
> I have to say that initially I was very excited about the idea... and
> after thinking about the details I am getting less excited as there
> are probably lots of gotchas. Below there is list of questions in the
> order of reading the proposal.
> 
> ** Use cases
> 
> - item 1 - in what sense the layer would be "dynamic"?

"create a dynamic point layer based on X,Y(,Z) coordinates" => if points
coordinates in the underlying layer change, then their geometries can be
updated during the next fetch.
Like for example the creation of a point layer from a CSV file with the
"watch file" option.

> 
> ** Indexes
> 
> - how will be indexes requested by underlying providers of primary
> sources? Will you add support to providers to return indices
> associated with a given primary source? How can sqlite make use of
> such information? Are you going to replicate the same indices for the
> virtual tables?

Yes we will need something like this in order to properly use indexes of
underlying layers. By populating sqlite_stat* tables.
And yes I realize that may involve replication of indexes.

For the case where more than one index is available on a table, the
xBestIndex method of the virtual table mechanism is made for that: an
estimated cost can be returned for each constraint


> 
> - the need to use spatial indices explicitly (with such weird syntax)
> is not good - this should be done by DBMS, not by the user. In
> practice nobody will use them and the spatial queries will be slow

Yes it is the Spatialite spatial index syntax.

> 
> ** Serialization
> 
> - I am lost. Are we going to materialize those sqlite views in memory
> or to disk? I thought they were meant to be just views wrapping QGIS
> layers

Yes, they are. They can be part of a project file. The second paragraph
is misleading.

> 
> ** Implementation details
> 
> - regarding QgsVectorLayer constructor: to me it does not look like a
> case special enough to require a new parameter
> 
> - what does it mean the provider would be based on spatialite? a
> subclass of spatialite provider - or a copy of it - or something else?

Not decided yet. But it is probably too early in the discussion to go
into such details.

> 
> - are the insert/update/delete triggers going to be provided by user
> or generated automatically? (I believe it should be automatic)

As automatic as possible. But in the general case, it is not possible.
Some columns cannot be updated (computed during the query). Or there may
be more than one possibility to "update" a join.

> 
> - how do we find out that the views from virtual layers are actually editable?

If an update trigger is defined.

> 
> - how do we determine whether the view has a geometry - and in case of
> multiple geometries, which one to choose? (geometry columns are just
> ordinary blobs, right?)
> 
> - how do we determine primary key for a view? (for feature IDs)

Good points. We will need to pass these information to the constructor,
as it is now with a postgres provider

> 
> ** Python bindings
> 
> - how would I create a virtual layer from Python? I would hope to do
> something like iface.addVectorLayer("SELECT * FROM layerX", "virtual")
> - would that be possible?

This kind of syntax (using loaded layer names instead of their explicit
definition) is clean but would require :
* that layers are already loaded before we can use them
* then to save the "loading order" of layers in the project file

The document refers to the use of explicit layer urls.
I don't have a strong opinion on this choice.

> 
> ** User interface
> 
> - what is the "referenced layers" box for - is it just to inform user
> which layers he used? Or does he need to explicitly register those
> "primary" layers somewhere?

See previous answer

> 
> ** Performance
> 
> - how will sqlite plan execution of queries?
>   - inner join of two virtual tables based on attributes?
>   - spatial joins?
> - does sqlite have access to any statistics of virtual tables and
> indices? These are quite important for reasonable query execution
> plans.

See answer more on the top: sqlite_stat* tables, xBestIndex

> 
> - During normal operation when iterating over vector layer there is
> conversion: backend -> QgsFeature. As far as I understand the virtual
> tables add two more conversions during execution: QgsFeature -> sqlite
> -> QgsFeature (first in virtual table implementation, second in new
> provider implementation). Is that correct? Have you tried to measure
> additional overhead?

Hmmm correct. I did not have any measure yet ...

> 
> 
> In summary:
> 
> I have the impression that virtual tables are just a simple way to
> provide access to external data to sqlite, not an invitation to use
> sqlite engine with "foreign" data. The provided API simply does not
> look mature enough. I would not be too optimistic about openness of
> sqlite project - the developer list is not publicly available and I
> haven't found a link to repository either (except for the commit log)
> - but I may be wrong.

http://osdir.com/ml/sqlite-dev/
http://www.sqlite.org/cgi/src/tree?ci=trunk

> 
> Aren't there any other embedded SQL engines that we could use?

You mean another powerful, embeddable, open source, spatial-enabled, SQL
engine ?
I don't see :)

> 
> Maybe we should have a simple SQL 'SELECT' statement parser that would
> parse the queries and then decide what to do with them:
> - execute on backend (if all tables are from the same DB)
> - execute with existing QGIS functionality (if we have efficient
> algorithm for that)
> - execute with sqlite engine (with materialized views)

Ok, after thinking about issues Matthias and you raised, inclusion of
the syntax parsing part of (a subset of) SQL in QGIS may be needed,
especially to generate spatial index queries ...

I still think we should try to avoid having both your item 2 and 3. We
better have to choose between executing with QGIS or with something else.




More information about the Qgis-developer mailing list