[Qgis-developer] QEP/RFC sqlite virtual tables

Wed Oct 29 09:34:10 PDT 2014

Hi Hugo!

On Tue, Oct 28, 2014 at 6:15 PM, Hugo Mercier <hugo.mercier at oslandia.com> wrote:
> are there other opinions ?
> Other arguments for one or the other side ?

I have to say that initially I was very excited about the idea... and
after thinking about the details I am getting less excited as there
are probably lots of gotchas. Below there is list of questions in the
order of reading the proposal.

** Use cases

- item 1 - in what sense the layer would be "dynamic"?

** Indexes

- how will be indexes requested by underlying providers of primary
sources? Will you add support to providers to return indices
associated with a given primary source? How can sqlite make use of
such information? Are you going to replicate the same indices for the
virtual tables?

- the need to use spatial indices explicitly (with such weird syntax)
is not good - this should be done by DBMS, not by the user. In
practice nobody will use them and the spatial queries will be slow

** Serialization

- I am lost. Are we going to materialize those sqlite views in memory
or to disk? I thought they were meant to be just views wrapping QGIS
layers

** Implementation details

- regarding QgsVectorLayer constructor: to me it does not look like a
case special enough to require a new parameter

- what does it mean the provider would be based on spatialite? a
subclass of spatialite provider - or a copy of it - or something else?

- are the insert/update/delete triggers going to be provided by user
or generated automatically? (I believe it should be automatic)

- how do we find out that the views from virtual layers are actually editable?

- how do we determine whether the view has a geometry - and in case of
multiple geometries, which one to choose? (geometry columns are just
ordinary blobs, right?)

- how do we determine primary key for a view? (for feature IDs)

** Python bindings

- how would I create a virtual layer from Python? I would hope to do
something like iface.addVectorLayer("SELECT * FROM layerX", "virtual")
- would that be possible?

** User interface

- what is the "referenced layers" box for - is it just to inform user
which layers he used? Or does he need to explicitly register those
"primary" layers somewhere?

** Performance

- how will sqlite plan execution of queries?
  - inner join of two virtual tables based on attributes?
  - spatial joins?
- does sqlite have access to any statistics of virtual tables and
indices? These are quite important for reasonable query execution
plans.

- During normal operation when iterating over vector layer there is
conversion: backend -> QgsFeature. As far as I understand the virtual
tables add two more conversions during execution: QgsFeature -> sqlite
-> QgsFeature (first in virtual table implementation, second in new
provider implementation). Is that correct? Have you tried to measure
additional overhead?

In summary:

I have the impression that virtual tables are just a simple way to
provide access to external data to sqlite, not an invitation to use
sqlite engine with "foreign" data. The provided API simply does not
look mature enough. I would not be too optimistic about openness of
sqlite project - the developer list is not publicly available and I
haven't found a link to repository either (except for the commit log)
- but I may be wrong.

Aren't there any other embedded SQL engines that we could use?

Maybe we should have a simple SQL 'SELECT' statement parser that would
parse the queries and then decide what to do with them:
- execute on backend (if all tables are from the same DB)
- execute with existing QGIS functionality (if we have efficient
algorithm for that)
- execute with sqlite engine (with materialized views)

Cheers
Martin