[Qgis-developer] New API for access to vector features

Marco Hugentobler marco.hugentobler at sourcepole.ch
Mon Jul 5 04:19:08 EDT 2010


Hi Martin

Thank you for this detailed description. I agree with you that the iterator 
pattern is a good solution for concurrent vector layer access. I encountered 
problems with concurrency a lot of times, e.g. in the print composer and in 
the analysis methods. So I will be happy once I can remove the workarounds for 
them.

> Which approach will be used can be
> decided at the level of each provider, so if some providers don't
> support concurrent access at all (or the implementation would be too
> complicated), we can stick with the use of locking.

Note that there are also a number of situations (especially in vector 
analysis) where we could have conflicts within the same thread. E.g. if 
calculating inverse distance interpolation, we need, for each point, calculate 
the distance to all other points. It would be convenient to use two iterators 
on the same layer for it, which would probably create a dead lock if using 
locking in providers 
(currently it's even worse, we store all the points in memory for IDW 
calculation).
Maybe we should declare concurrent access mandatory for 2.0 providers?


An interesting point if implementing concurrent access without locking would 
probably be the implementation of robust iterators (if provider is edited 
while iterators are open). Maybe something for 2.0...


Regards,
Marco   



Am Samstag, 3. Juli 2010, um 10.32:28 schrieb Martin Dobias:
> Hi all,
> 
> (sorry for a long post)
> 
> while working on the threaded rendering of map layers, it became clear
> that we need an improved API for retrieving features from vector
> layers. We need to provide safe concurrent access to features from
> multiple threads. That's a situation when a worker thread (for
> rendering) is fetching data from a layer and in meanwhile the main
> thread starts fetching from the same layer. Currently, the main thread
> would interfere with the worker thread - if we're lucky, we would
> fetch incorrect data in both threads, but generally this can (and
> will) lead to segfaults.
> 
> There are basically two ways how we could handle the concurrent access:
> 1. use locking in providers (e.g. with mutexes) to guarantee that only
> one thread is accessing vector data store at one time.
> 2. update providers to allow concurrent access - e.g. for PostGIS
> provider this would mean to work with more queries at once.
> 
> To briefly compare the two approaches, the second one is more
> user-friendly - e.g. the user can identify a layer (with identify map
> tool) while the layer is being rendered, without any freezes of GUI.
> If the locking was used, the layer would have to be first rendered
> completely, only then the features of the layer could be identified,
> producing a freeze in GUI until the layer is rendered. On the other
> hand, implementation of locking is typically simpler than allowing
> concurrent access without blocking. Which approach will be used can be
> decided at the level of each provider, so if some providers don't
> support concurrent access at all (or the implementation would be too
> complicated), we can stick with the use of locking.
> 
> I've been trying to implement locking for the current API as is -
> using vector data providers' functions select(), nextFeature() and
> rewind(), but I've kept failing on the issue with "unfinished"
> iterations. Imagine this code:
> provider->select(...)
> while ( provider->nextFeature(f) )
> {
>   // do something
>   if ( x > 10 ) break;
> }
> When a locking scheme would be used, the provider would get locked in
> select() method and unlocked again when the last feature has been
> fetched. But in the code example above, the last feature probable
> won't be reached and fetching of the features will be stopped earlier.
> The provider is locked and there is no chance to determine that no
> more features will be locked and start fetching features from other
> threads. When trying to implement support for simultaneous access
> without locking, it is unclear when the query can be closed and
> resources freed.
> 
> So, I've worked on implementation of Iterator design pattern for the
> read access. The vector data provider interface should get following
> new function:
> QgsFeatureIterator getFeatures( ... );
> The arguments (which I've shortened with three dots) are the same as
> currently for the select() method: what attributes to fetch, what
> rectangle, whether to fetch geometries and whether to use intersection
> test.
> 
> The QgsFeatureIterator is actually a proxy for the iterator of data
> provider. The class looks as follows:
> class QgsFeatureIterator
> {
>   // construct invalid iterator
>   QgsFeatureIterator();
>   // construct a valid iterator for iterating a vector layer
>   QgsFeatureIterator( QgsVectorDataProviderIterator* iter );
> 
>   bool nextFeature(QgsFeature& f);
>   bool rewind();
>   bool close();
> };
> 
> The usage - with current API:
> QgsFeature f;
> provider->select( ... );
> while ( provider->nextFeature(f) )
> {
>   // do something
> }
> 
> The usage - with new API:
> QgsFeature f;
> QgsFeatureIterator fi = provider->getFeatures( ... );
> while ( fi.nextFeature(f) )
> {
>   // do something
> }
> 
> You can see the difference in syntax is not big, though there are
> advantages: when the feature iterator instance gets out of scope, the
> close() method is called automatically - the provider can free
> resources and/or release locks, so there's no problem with
> "unfinished" iterations. And if there's support in provider for
> simultaneous access, there could be more feature iterators fi1, fi2
> that access the data at the same time without any interference.
> 
> >From the provider implementer's point of view, the getFeatures()
> 
> function would look like this:
> QgsFeatureIterator MyProvider::getFeatures(...)
> {
>   return QgsFeatureIterator( new QgsOgrFeatureIterator( this, ... ) );
> }
> 
> where QgsOgrFeatureIterator is a class derived from
> QgsVectorDataProviderIterator class (interface):
> 
> class QgsVectorDataProviderIterator
> {
>   bool nextFeature(QgsFeature& f) = 0;
>   bool rewind() = 0;
>   bool close() = 0;
> }
> 
> The QgsOgrFeatureIterator would implement the nextFeature(), rewind()
> and close() calls from QgsVectorDataProviderIterator interface in a
> similar way like the provider originally did. The iterator class
> encapsulates all the reading code, so the provider class gets less
> cluttered.
> 
> You might ask why to use QgsFeatureIterator proxy class, and why not
> to use directly the derivatives of QgsVectorDataProviderIterator. The
> reason is that if we used directly the pointer, it would have to be
> deleted after each use and could easily produce some memory leaks if
> the deletion is omitted (or even could keep the provider locked - as
> the close() method wouldn't be called). The QgsFeatureIterator can be
> easily designed in a way that it can be passed around by value and
> will delete the internal pointer to the iterator implementation once
> it gets out of scope.
> 
> Finally, there is the issue of backward compatibility: we can't just
> remove the current API as it is being used widely in 3rd party plugins
> and applications. The idea is to implement the provider data access
> methods by the means of the new API: each provider would have its own
> "old api" iterator:
> 
> void select( ... )
> {
>   mOldApiIter = getFeatures( ... )
> }
> bool nextFeature( f )
> {
>   return mOldApiIter.nextFeature( f );
> }
> 
> Basically the same concept should be adapted for the QgsVectorLayer
> class, which takes in account also temporarily added, modified and
> deleted features.
> 
> I believe that if the new API proves to be fine, the current
> select/nextFeature combo will be removed in QGIS 2.0.
> 
> I would really like to hear some feedback on this design from anyone
> who is concerned about handling of vector layers. If you have any
> questions or suggestions, don't hesitate to reply.
> 
> Regards
> Martin
> _______________________________________________
> Qgis-developer mailing list
> Qgis-developer at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/qgis-developer


-- 
Dr. Marco Hugentobler
Sourcepole -  Linux & Open Source Solutions
Webereistrasse 66, 8134 Adliswil, Switzerland
marco.hugentobler at sourcepole.ch http://www.sourcepole.ch
Technical Advisor QGIS Project Steering Committee


More information about the Qgis-developer mailing list