[Qgis-developer] New API for access to vector features

Martin Dobias wonder.sk at gmail.com
Sat Jul 3 04:32:28 EDT 2010


Hi all,

(sorry for a long post)

while working on the threaded rendering of map layers, it became clear
that we need an improved API for retrieving features from vector
layers. We need to provide safe concurrent access to features from
multiple threads. That's a situation when a worker thread (for
rendering) is fetching data from a layer and in meanwhile the main
thread starts fetching from the same layer. Currently, the main thread
would interfere with the worker thread - if we're lucky, we would
fetch incorrect data in both threads, but generally this can (and
will) lead to segfaults.

There are basically two ways how we could handle the concurrent access:
1. use locking in providers (e.g. with mutexes) to guarantee that only
one thread is accessing vector data store at one time.
2. update providers to allow concurrent access - e.g. for PostGIS
provider this would mean to work with more queries at once.

To briefly compare the two approaches, the second one is more
user-friendly - e.g. the user can identify a layer (with identify map
tool) while the layer is being rendered, without any freezes of GUI.
If the locking was used, the layer would have to be first rendered
completely, only then the features of the layer could be identified,
producing a freeze in GUI until the layer is rendered. On the other
hand, implementation of locking is typically simpler than allowing
concurrent access without blocking. Which approach will be used can be
decided at the level of each provider, so if some providers don't
support concurrent access at all (or the implementation would be too
complicated), we can stick with the use of locking.

I've been trying to implement locking for the current API as is -
using vector data providers' functions select(), nextFeature() and
rewind(), but I've kept failing on the issue with "unfinished"
iterations. Imagine this code:
provider->select(...)
while ( provider->nextFeature(f) )
{
  // do something
  if ( x > 10 ) break;
}
When a locking scheme would be used, the provider would get locked in
select() method and unlocked again when the last feature has been
fetched. But in the code example above, the last feature probable
won't be reached and fetching of the features will be stopped earlier.
The provider is locked and there is no chance to determine that no
more features will be locked and start fetching features from other
threads. When trying to implement support for simultaneous access
without locking, it is unclear when the query can be closed and
resources freed.

So, I've worked on implementation of Iterator design pattern for the
read access. The vector data provider interface should get following
new function:
QgsFeatureIterator getFeatures( ... );
The arguments (which I've shortened with three dots) are the same as
currently for the select() method: what attributes to fetch, what
rectangle, whether to fetch geometries and whether to use intersection
test.

The QgsFeatureIterator is actually a proxy for the iterator of data
provider. The class looks as follows:
class QgsFeatureIterator
{
  // construct invalid iterator
  QgsFeatureIterator();
  // construct a valid iterator for iterating a vector layer
  QgsFeatureIterator( QgsVectorDataProviderIterator* iter );

  bool nextFeature(QgsFeature& f);
  bool rewind();
  bool close();
};

The usage - with current API:
QgsFeature f;
provider->select( ... );
while ( provider->nextFeature(f) )
{
  // do something
}

The usage - with new API:
QgsFeature f;
QgsFeatureIterator fi = provider->getFeatures( ... );
while ( fi.nextFeature(f) )
{
  // do something
}

You can see the difference in syntax is not big, though there are
advantages: when the feature iterator instance gets out of scope, the
close() method is called automatically - the provider can free
resources and/or release locks, so there's no problem with
"unfinished" iterations. And if there's support in provider for
simultaneous access, there could be more feature iterators fi1, fi2
that access the data at the same time without any interference.

>From the provider implementer's point of view, the getFeatures()
function would look like this:
QgsFeatureIterator MyProvider::getFeatures(...)
{
  return QgsFeatureIterator( new QgsOgrFeatureIterator( this, ... ) );
}

where QgsOgrFeatureIterator is a class derived from
QgsVectorDataProviderIterator class (interface):

class QgsVectorDataProviderIterator
{
  bool nextFeature(QgsFeature& f) = 0;
  bool rewind() = 0;
  bool close() = 0;
}

The QgsOgrFeatureIterator would implement the nextFeature(), rewind()
and close() calls from QgsVectorDataProviderIterator interface in a
similar way like the provider originally did. The iterator class
encapsulates all the reading code, so the provider class gets less
cluttered.

You might ask why to use QgsFeatureIterator proxy class, and why not
to use directly the derivatives of QgsVectorDataProviderIterator. The
reason is that if we used directly the pointer, it would have to be
deleted after each use and could easily produce some memory leaks if
the deletion is omitted (or even could keep the provider locked - as
the close() method wouldn't be called). The QgsFeatureIterator can be
easily designed in a way that it can be passed around by value and
will delete the internal pointer to the iterator implementation once
it gets out of scope.

Finally, there is the issue of backward compatibility: we can't just
remove the current API as it is being used widely in 3rd party plugins
and applications. The idea is to implement the provider data access
methods by the means of the new API: each provider would have its own
"old api" iterator:

void select( ... )
{
  mOldApiIter = getFeatures( ... )
}
bool nextFeature( f )
{
  return mOldApiIter.nextFeature( f );
}

Basically the same concept should be adapted for the QgsVectorLayer
class, which takes in account also temporarily added, modified and
deleted features.

I believe that if the new API proves to be fine, the current
select/nextFeature combo will be removed in QGIS 2.0.

I would really like to hear some feedback on this design from anyone
who is concerned about handling of vector layers. If you have any
questions or suggestions, don't hesitate to reply.

Regards
Martin


More information about the Qgis-developer mailing list