[Qgis-developer] Processing NG/V2 - brainstorming

Nyall Dawson nyall.dawson at gmail.com
Mon Dec 5 01:03:25 PST 2016


Hi all,

I've recently been informally chatting about possible enhancements to
processing with a few QGIS team members, and I thought it'd be worth
starting a public brainstorm about these ideas.

This is really just "thinking aloud" about what the next logical steps
are for processing and how we can make it more competitive against
programs like FME.

So here we go... a bunch of random ideas on future processing enhancements:


1. Rework native algorithms to avoid layer input/outputs

(Full credit goes to Matthias here). One current inefficiency with
processing models is that every step is exported to a file based
format, which is then reread in for the next algorithm. This means
that a simple set of steps like buffer->reproject involves multiple
conversions from OGR formats to QgsFeature/QgsGeometry and back to the
OGR output format, when it could be simplified to just two operations
on a QgsFeature's geometry and then a final write to disk. In addition
to the inefficiency here we also lose things like long field names and
full support for z/m/curves (depending on the intermediate file
format).

So... how to address this... Matthias came up with the idea that
native processing algs could accept a feature iterator instead of an
input layer, and themselves be a feature iterator. This effectively
would make a processing model a chain of iterators which features are
"pulled" through from a final writer step. Ie, the source
layer->buffer->save to layer->load layer->reproject->save to layer
model becomes:

a writer
-> which reads features from the iterator provided a transform alg
-> which reprojects the geometry from features provided by a buffer
alg's output iterator
-> which buffers the geometry on features from an iterator from the
original source layer

(Obviously, anytime a non-native algorithm (eg saga/grass/ogr) is used
then the features would need to be written to disk first. But this is
no different to the current behaviour so there shouldn't be any extra
cost incurred.)

This gets a little trickier when we want to multithread something, eg.
2 input layers-> each buffered -> intersection of the two. But we
could handle this by using a form of "pipe" iterator, which sucks in
features as fast as possible from its input iterator and stores them
in one thread, and then an algorithm in another thread consumes these
features as they become available. Ie:

thread a:
input layer 1 iterator -> buffer 1 alg iterator -> "pipe" iterator a ->


                                thread c: intersection alg
thread b:
input layer 2 iterator -> buffer 2 alg iterator  -> "pipe" iterator b ->

where thread c reads the features from "pipe iterator a" and "b" as
they become available, and then does its processing on them.

(hope that makes sense!)

2. Georeferenced geometries

I think for the approach in 1 to work we'd also need to introduce the
concept of "referenced" geometries. This would basically be
QgsGeometry + a QgsCoordinateReferenceSystem. It would allow retrieval
of a geometry's CRS without requiring any knowledge of its source
layer (or where no layer exists, eg the canvas extent as a geometry).

I've pondered several approaches to this, such as:
- QgsReferencedFeature (QgsFeature + crs): This doesn't work for
non-feature based geometries or allow features with multiple
geometries in different CRSes. (See
https://github.com/qgis/qgis3.0_api/issues/21).
- QgsReferencedGeometry: subclass of QgsGeometry with a CRS member.
This approach would avoid adding any extra overhead to QgsGeometry.
But given that the main use of QgsGeometry (geometry attached to a
feature from a layer) will always have a CRS associated, this seems
like it unnecessarily complicates the API.

So my current preference would be for QgsGeometry to gain a
QgsCoordinateReferenceSystem member variable, which is an invalid crs
if the geometry is not referenced. This should still be quite
lightweight given that QgsCoordinateReferenceSystem is implicitly
shared.


3. Porting components of processing to core

There's demand (from eg QField) to reuse parts of processing outside of PyQGIS.

I think good candidates for porting to core would be:
- parameters
- inputs
- the algorithm base class

In addition to allowing use outside of python this would also help
strengthen these components by the static typing which would result of
porting to c++.

I'd also like to see the results + history dialogs merged and moved to
core so that they can also be reused for non-processing tasks (eg
composer exports).



So there we go. What's everyone's thoughts? Are these ideas worth
pursing? Is there other things we should be looking at investigation
for future processing enhancements?

Nyall


More information about the Qgis-developer mailing list