[Qgis-developer] Processing NG/V2 - brainstorming

Matthias Kuhn matthias at opengis.ch
Mon Dec 5 11:46:53 PST 2016


Thanks for writing this up Nyall,

This is something that I has been on my list of ideas for a long time
already and that I have never got round to get down to.

I think you have covered the topic quite well already, so I don't think
there's a lot I have to add right now.

On 12/05/2016 06:41 PM, Alexander Bruy wrote:

> Regarding iterators approach... It sounds interesting, but we need to think
> also about possibility to work with layers, which are not loaded in QGIS.
> In this case as I understand we need to construct layer and then create
> an iterator for Processing from it.

If the algorithm is a QGIS core one, that's correct. But I assume that's
done already now, so it shouldn't be much of a difference.

For anything else, the iterator approach will not be usable either way
and it will have to end up in a file anyway. So the model processor
should be smart enough to handle that.

Case 1:
qgis alg -> qgis alg
  iterator

Case 2:
qgis alg -> external alg
  find a suitable file format and route the iterator to this file

Case 3:
  external alg -> external alg
check if they have a common file type, if yes, use it, if not convert
between something usable.

If an external file is involved as source, a "file loader" algorithm
should be transparently inserted to handle conversion to an iterator or
find the best intermediate format or just pass the file as-is.

> Another idea (obvious and already discussed a bit) is to adopt Processing
> to use recently added Task Manager, so algs can be executed in background
> and models can be executed using multiple threads when this is possible.

I think that's something that will happen in any case. We should be able
to combine the two ideas.

The different possibilities here are probably:

* Always use a file to pass data between algs:
    Heavy on disk I/O and which has performance impacts.

* Use a memory layer to pass data between algs:
    Heavy on memory which may fail with big data (*)

* Iterator approach:
    Performance win. Most complex because it adds a lot of
task-interdependency, scheduling effort and possibly will still use a
lot of memory if the consumer algorithm is not fast enough to handle the
data produced by the generator algorithm. So there should be some kind
of coordination between the iterator and the producer to pause the
producer task if the queue gets too big.

Best regards
Matthias

(*) I always like to use the term big data. It makes me feel like a
sales person.


More information about the Qgis-developer mailing list