[GRASS-user] multiprocessing in python
Moritz Lennert
mlennert at club.worldonline.be
Wed Feb 7 00:59:24 PST 2018
[Please always keep the list in CC.]
On 06/02/18 22:57, Leonardo Hardtke wrote:
> Hi, thanks Moritz.
> I tried with your suggestion but I get the same error out...
>
> As a side note, If the process does not read any data in it works as
> expected (ie commenting the for loop).
Can you identify which specific call in the loop ?
Have you tried launching with
pool.map(tile_process, [1, 2]) ?
>
> I have a similar approach working OK with plain gdal
> (https://gist.github.com/leohardtke/b54e79ed93546c0db840c7b5e951a6ce).
>
> There must be something with the grass raster python module, but I can't
> figure it out.
Not sure if it is raster, or rather temporal dataset handling. I don't
have time to look at this in detail now, so I'm putting grass-dev in CC
so you might get some answers from people more knowledgeable in temporal
data processing than me.
A bit more info (e.g. more details of the code, such as the definition
of your pool, but also OS, versions, etc) might be helpful.
Moritz
>
> Cheers
>
> On 7 February 2018 at 00:47, Moritz Lennert
> <mlennert at club.worldonline.be <mailto:mlennert at club.worldonline.be>> wrote:
>
> On 06/02/18 12:09, Leonardo Hardtke wrote:
>
> Dear all,
> I am working on a module to extract the phenological parameters
> (like timesat) from a time series implemented in python/cython
> and making use of gscript and other grass stuff.
> It works great on a 256x256 and as the plan is applying it over
> Australia at 250m over 17 years, I need to split the process in
> small tiles. The idea is to run this processes in parallel and I
> am having issues implementing it.
>
> This would be the first part of the process that runs on each tile:
>
> def tile_process(tile_index):
> '''
> Function for every worker:
> Applies any function to the sub_region corresponding to
> the tile_index.
> '''
> global Rows
> global Cols
> global RowBlockSize
> global ColBlockSize
> global full_region
> global dates
> global years
> global indices
> global data_serie
> global yr_limits_extra
> global yr_limits
> global dbif
>
> sub_name='block'
> TileRow, TileCol, sr =
> sub_region(tile_index,full_region,RowBlockSize,ColBlockSize)
> # # Define a temporary region based on the parameters
> caluculated with the
> start_row = TileRow * RowBlockSize
> start_col = TileCol * ColBlockSize
> n_rows = sr['rows']
> n_cols = sr['cols']
>
> strds = tgis.SpaceTimeRasterDataset(data_serie)
> strds.select(dbif=dbif)
> maps = strds.get_registered_maps_as_objects(dbif=dbif)
>
> # Numer of time steps
> steps = len(maps)
> # Make an empty array
> #print(steps)
> EVI = np.empty([steps,n_rows,n_cols])
> # fill the array
> for step, map in enumerate(maps):
> map.select(dbif=dbif)
> image_name = map.get_name()+'@'+data_serie.split('@')[1]
> #print("reading: {}".format(image_name))
> EVI[step,:] =
> raster2numpy_sub(image_name,start_row,n_rows,start_col,n_cols)
> mean = EVI.mean()
> print(mean)
> ....
> ....
> ....
>
>
> and this is how I start the multiprocess pool.
>
> pool.map(tile_process, xrange(RowBlockN*ColBlockN))
> pool.close()
> pool.join()
>
> and it gives me:
>
> AssertionError: can only test a child process
>
>
> of course if I do: tile_process(0) or tile_process(1) etc ,the
> right result comes out.
>
> Does any of you have experience with this? Any suggestion would
> be welcome!
> Sorry for the messy code. Is still in early stage.
>
>
> Just a wild guess: have you tried with range (which returns a list)
> instead of xrange (which returns an xrange object) ?
>
> Moritz
>
>
>
>
> --
> Dr. Leonardo A. Hardtke
> C3 UTS, Scientific Officer
> CB04.06.315.06
> Email:leonardoandres.hardtke at uts.edu.au
> <mailto:leonardoandres.hardtke at uts.edu.au> orleohardtke at gmail.com
> <mailto:leohardtke at gmail.com>
More information about the grass-user
mailing list