[QGIS-Developer] Running grass algorithms in threads

Vaclav Petras wenzeslaus at gmail.com
Tue Aug 14 17:52:14 PDT 2018


On Tue, Aug 14, 2018 at 7:10 PM, Nyall Dawson <nyall.dawson at gmail.com>
wrote:

> On Tue, 14 Aug 2018 at 21:43, Rudi von Staden <rudivs at gmail.com> wrote:
> >
> > Hi all,
> >
> > The bottleneck in my script at the moment is the calculation of zonal
> stats using 'grass7:r.stats.zonal'. I thought I might speed things up by
> using QgsTask.fromFunction() or QgsProcessingAlgRunnerTask() to run these
> calculations in parallel. In my tests of both approaches the tasks seem to
> complete (task.status() == QgsTask.Complete), but the output file is only
> generated for 1 of 4 parallel tasks (the task that finishes first).
> >
> > I'm assuming this is because grass algorithms are not thread safe? Or am
> I missing something in my implementation that could make this work?
>
> I strongly suspect that grass algorithms cannot be run in parallel.
> This is why they cannot run in the background in QGIS like the
> native/GDAL algorithms can. But I'd love for confirmation about this
> and whether there's any way to make GRASS multi-thread safe.
>


In general, it works. You can run GRASS modules in parallel if you set
things right which is best achieved by running the parallel processes in
separate GRASS mapsets.

GRASS modules are separate processes, so we are talking about parallel
processes, rather than threads, so there are pretty separated. You can run
for example (assuming GRASS GIS session in NC SPM location, new/empty
mapset, and Linux command line, so that & starts process in the background):

r.neighbors input=elevation output=elevation_1 size=21 &
r.neighbors input=elevation output=elevation_2 size=21 &
r.neighbors input=elevation output=elevation_3 size=21 &
r.neighbors input=elevation output=elevation_4 size=21 &

However, conflicts may arise if you are changing computational region at
the same time as doing calculations or if you are writing vectors using the
default settings for attribute table, i.e. one SQLite db for all vector
maps in a mapset. You can make these things work, e.g. by passing a
computational region through environment rather than by g.region or by
using different backend for attributes. However, the safest way are the
separate mapsets, for example (assuming Linux for & and an existing
location called nc_spm):

# create the mapsets
grass -e -c ~/grassdata/nc_spm/par1
grass -e -c ~/grassdata/nc_spm/par2
grass -e -c ~/grassdata/nc_spm/par3
grass -e -c ~/grassdata/nc_spm/par4
# run v.random (just an example which creates vector with attributes)
grass ~/grassdata/nc_spm/par1 --exec v.random output=points_1 column=value
npoints=1000000 &
grass ~/grassdata/nc_spm/par2 --exec v.random output=points_2 column=value
npoints=1000000 &
grass ~/grassdata/nc_spm/par3 --exec v.random output=points_3 column=value
npoints=1000000 &
grass ~/grassdata/nc_spm/par4 --exec v.random output=points_4 column=value
npoints=1000000 &
# just to finish the example, let's merge the vectors in a new mapset
grass -e -c ~/grassdata/nc_spm/par
grass ~/grassdata/nc_spm/par --exec v.patch input=points_1 at par1
,points_2 at par2,points_3 at par3,points_4 at par4 output=points
grass ~/grassdata/nc_spm/par --exec v.info map=points -t
# and delete the rest
rm ~/grassdata/nc_spm/par1
rm ~/grassdata/nc_spm/par2
rm ~/grassdata/nc_spm/par3
rm ~/grassdata/nc_spm/par4

I'm assuming that we are talking about running algorithms in parallel in
QGIS, not parallelism inside the algorithms. Other considerations apply to
that (parallelization is controlled by the modules themselves, see e.g.
nprocs option for r.sun or v.surf.rst in G 7.4). Note that I'm talking
about (pure) GRASS, so it depends how QGIS is handling it (I recall it is
using --exec, but I don't know what it is doing with locations and
mapsets). Please also note that I didn't measure if the v.random example
would be actually more advantageous the a single process.

Best,
Vaclav


> Because this is grass related (and not QGIS specific) I'd suggest
> asking on the grass mailing list, and relaying any responses back
> here.
>

> Nyall
>
> >
> > Thanks,
> > Rudi
> >
> >
> >
> > My code for the QgsTask approach is as below:
> >
> > def getZonal(task, habitatModelFile, cover):
> >     tempFile = QgsProcessingUtils.generateTempFilename("output.tif")
> >     processing.run("grass7:r.stats.zonal", {
> >         'base':habitatModelFile,
> >         'cover':cover,
> >         'method':5,
> >         '-c':False,
> >         '-r':False,
> >         'output':tempFile,
> >         'GRASS_REGION_PARAMETER':None,
> >         'GRASS_REGION_CELLSIZE_PARAMETER':0,
> >         'GRASS_RASTER_FORMAT_OPT':'',
> >         'GRASS_RASTER_FORMAT_META':''},context=context,feedback=
> algFeedback)
> >
> >     if task.isCanceled():
> >         deleteFile(tempFile)
> >         return
> >
> >     return tempFile
> >
> > ls90Task = QgsTask.fromFunction('LS90', getZonal, habitatModelFile=hm1,
> cover=ls90Layer)
> > QgsApplication.taskManager().addTask(ls90Task)
> > feedback.pushInfo("Calculating LS14 mean...")
> > ls14Task = QgsTask.fromFunction('LS14 ', getZonal, habitatModelFile=hm2,
> cover=ls14Layer)
> > QgsApplication.taskManager().addTask(ls14Task)
> > hs90Task = QgsTask.fromFunction('HS90 ', getZonal, habitatModelFile=hm3,
> cover=hs90Layer)
> > QgsApplication.taskManager().addTask(hs90Task)
> > hs14Task = QgsTask.fromFunction('HS14 ', getZonal, habitatModelFile=hm4,
> cover=hs14Layer)
> > QgsApplication.taskManager().addTask(hs14Task)
> >
> > while (len([t for t in [ls90Task.status(), ls14Task.status(),
> hs90Task.status(),
> >             hs14Task.status()] if t in [QgsTask.Running,
> QgsTask.Queued]]) > 0)
> >             and not feedback.isCanceled():
> >     sleep(1)
> >
> > if feedback.isCanceled():
> >     # some cleanup code (send task.cancel() and wait for tasks to
> terminate)
> >     break
> >
> > ls90Result = ls90Task.returned_values
> > ls14Result = ls14Task.returned_values
> > hs90Result = hs90Task.returned_values   # only this file exists
> > hs14Result = hs14Task.returned_values
> >
> >
> > _______________________________________________
> > QGIS-Developer mailing list
> > QGIS-Developer at lists.osgeo.org
> > List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> > Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> _______________________________________________
> QGIS-Developer mailing list
> QGIS-Developer at lists.osgeo.org
> List info: https://lists.osgeo.org/mailman/listinfo/qgis-developer
> Unsubscribe: https://lists.osgeo.org/mailman/listinfo/qgis-developer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/qgis-developer/attachments/20180814/6f722453/attachment-0001.html>


More information about the QGIS-Developer mailing list