[Qgis-developer] Table join branch

Martin Dobias wonder.sk at gmail.com
Thu Jul 29 09:20:06 EDT 2010


Hi Marco

finally I've found time to look at the code...

On Mon, Jul 19, 2010 at 9:10 AM, Marco Hugentobler
<marco.hugentobler at sourcepole.ch> wrote:
>
> Internally, QgsVectorLayer stores the join info (fields and join table provider
> key) and remaps the field ids (first the provider ids, then the joined field ids
> and last the added field ids). When querying the joined field values, a subset
> string is set to the table provider to query a feature with the value of the
> join field.
>
> As you might expect, there are still a number of issues:
> - Joined fields cannot be edited (should they?). So attribute table and info-
> tool should disable those rows. QgsVectorLayer probably needs a method to tell
> attribute table and info tool which rows are not editable.
> - Performance is an issue. The current implementation tries to improve
> performance by automatically creating attribute indices in the ogr provider
> (could be implemented in other providers too). Still there is a siginificant
> performance penalty when doing classifications based on joined attributes or
> attributes searches (info tool and opening attribute table are usually fast,
> because not many features are queried at once).
> One possibility to enhance this could be to load tables into memory providers
> and implementing attribute index capability there. Are there other
> possibilities?

Just to be sure with the table joins, do we refer only to left outer
joins on one table where the join predicate checks for equality of
attributes? I.e. something like this:
SELECT * FROM layer LEFT OUTER JOIN table1 ON layer.a = table1.x [
LEFT OUTER JOIN table2 ON layer.b = table2.y ... ]
Or could there be other scenarios like other join types (inner, right
outer), more complex join predicates (possibly involving geometries)?

(I will assume only left outer join from now on)

In my opinion, the current implementation is not suitable for
production use. In general case, the joining needs linear time for
fetching attributes for each feature. Using the attribute indexes can
get this typically to logarithmic time, but still there's need that
user creates the indexes and provider supports that (probably not many
ogr drivers have this functionality, postgres provider might not have
sufficient privileges etc). I think we should aim to find joined
table's row for each feature in constant time, otherwise the
performance will be still an issue.

If we can assume that the joined table won't be modified (much), we
could build a hash table for each joined table: key = join attribute,
value = vector of joined attributes. So, later when the layer is being
iterated, for each feature we would only have to look up the attribute
in hash table and copy the joined attributes. (Additionally, if we
wanted to save memory, the hash table values could be only feature ID
in the joined table - the joined attributes would be loaded at once
when starting iteration over layer and deleted again when the
iteration over layer has finished).

This way I hope we could achieve good performance when fetching
features and to keep the design simple (and with no need for external
attribute indexes).

I will not post comments on any particular source code issues, as it
is probably too early for that and the focus should stay on the design
level.

Regards
Martin


More information about the Qgis-developer mailing list