[Qgis-developer] Table joins

Matthias Kuhn matthias.kuhn at gmx.ch
Mon Nov 25 11:19:09 PST 2013


Sorry for the long text, I will try to keep it interesting for you.

On 25.11.2013 18:28, Hugo Mercier wrote:
> For the aggregate functions, we do not have this kind of things for
> regular, "unjoined" tables, so we won't have it for joined tables.
Not yet ;)
> We could stick to the current behaviour of QGIS joins for now : keep
> rows of the "left" part and get the first row (or null) of the right
> table(s).
I would prefer to leave this choice to the user.
>
>>> I did not yet have a look at the code, but are there any chance we can
>>> treat fields of a "relation" as first-class fields in a layer ? The idea
>>> would be to use them to define 1:1 (current "joins") and get rid of
>>> layer's "joins".
>> As relational models can get rather complex, just showing related 
>> records is not always what we want. Sometimes, the related table has 
>> another relation to another relation and this can get very complex (and 
>> time-consuming), so I would rather opt to have some switch about which 
>> relations we should treat as first-class fields of a table.
> You're right. We would have to specify somewhere what relations we would
> like to see in a join.
>
>> For the expressions, there should be a syntax introduced for joining 
>> other tables / resolving relations.
> The only issue I see is fields from different tables that may have the
> same name. So either we need some kind of "." prefix, or we need a way
> to declare a remapping of names.
> And this should be applied to the different $geometry fields of the
> tables involved in the relation.
Just thinking out loud, so these are just ideas...

A "." notation would be unambiguous and could be chained over several
intermediate relations.

However, I'm not sure if we want/need to stick to only the defined
relations in the case of expressions. A JOIN x ON y may be suited as
well? So if you want to join something you didn't define (and you maybe
need only once) there is no need to open the relations dialog first.

>
>>> We are also asked for the possible addition of spatial joins within
>>> QGIS. I would find it nice to be able to define them as "relations".
>>> Probably starting with a 1:1 "intersects" join, but with an open
>>> possibility for 1:N spatial joins (contains, etc.), and why not free
>>> custom user expressions as join predicates (including "left" and "right"
>>> geometries for instance)
>>>
>> Spatial joins definitely a nice idea, I hope we can get to this one 
>> day, but I didn't think a lot about it yet.
>>
>> When it comes to the API I think the joins (outer, inner, left, right) 
>> is a rather easy call to relation object returning an iterator. What I 
>> am not sure is about an appropriate UI where the user can define such a 
>> thing as a join so the attribute table shows the resulting table.
> Relation properties can be used to define the type of join you want
> (left, cross, "qgis old join style", etc.) and the predicate to use (==
> by default, or an expression with "intersects()" or whetever you like).
The predicate sounds like an idea. However, the current solution is
targeted at fully normalized databases with 1:N relations only.
intersects() or > or similar would introduce N:M relations and should be
treated as a different kind of relation (e.g. unlink and link would not
be available there).
Do we always need the relation in the same way? Or should the "type" of
a relation be defined on a per-task basis rather than a project basis?
> Then in the layer's properties, you could define what relations are
> "seen" by this layer, and so what fields are available to the attribute
> table.
I somehow like the idea, I would see the remapping of names at the same
position (==> Choose manually which fields you want to have
"hard-joined"). Chaining should also be available here, so fields on a
table C linked via a table B should also be visible on A.
If we do this, the fields should be considered parts of the features, so
I guess if we change a feature on table B which is linked by several
features of table A, layer A should emit appropriate attributeChanged
signals for all the features?
Really interesting this becomes with the N:M relations introduced above
with the predicate. If you change the attribute of a feature on table B,
you have to do an intersect with table A to check for affected features
to emit the appropriate signal.

>
> And then, we could tag the old "Join" functionality obsolete.
>
To be honest, I would like to tag it obsolete. But it seems like a
tricky thing and I don't want to reduce the flexibility of the new
solution or still miss a feature of joins, before tagging the joining
obsolete.

One more thing we need to resolve before we can mark it obsolete is
performance. The joins have been cached, the relations are not. Maybe
there will also be a cache, but for a real database I would like to have
a more sophisticated solution executed on the back-end rather than
trying to solve the issues locally. Again the scenario of a complex
database: you need to keep a local (possibly huge) cache for every table
just for the joins and you will still be slower in joining the data than
a dedicated database server will have the joined data ready.

So my conclusion for the moment is, that with relations we are entering
an area where we need to integrate database backends better, while joins
offer an easy approach to solve a lot of issues. Therefore I would like
to keep the two solutions side-by-side for some releases and wait for
the relations to settle down and declare joins as obsolete, once we are
sure that nobody will miss them.


More information about the Qgis-developer mailing list