[gdal-dev] Parllelization slows down single gdal_calc process in python

Flippmoke flippmoke at gmail.com
Wed Mar 2 16:57:58 PST 2016


On my phone so can explain fully, but there are several blockers in GDAL Library that prevent multi threading from being effective. Try using different processes if it is completely required.

Blake Thompson

> On Mar 2, 2016, at 5:44 PM, Lorenzo Bottaccioli <lorenzo.bottaccioli at gmail.com> wrote:
> 
> Hi,
> I'm trying to parallelize a code for raster calculation with Gdal_calc.py, but i have relay bad results. I need to perform several raster operation like FILE_out=FILA_ak1+FILE_bk2.
> 
> This is the code I'm usign:
> 
> import pandas as pd
> import os
> import time
> from multiprocessing import Pool
> 
> df = pd.read_csv('input.csv', sep=";", index_col='Date Time', decimal=',')
> df.index = pd.to_datetime(df.index, unit='s')
> 
> start_time = time.time()
> pool=Pool(processes=8)
> pool.map(mapcalc,[df.iloc[i*20:(i+1)*20] for i in range(len(df.index)/20+1)])
> pool.close()
> pool.join()
> print("--- %s seconds ---" % (time.time() - start_time))
> 
> 
> def mapcalc(df):
> 
>     month={1:'17',2:'47',3:'75',4:'105',5:'135',6:'162',7:'198',8:'228',9:'258',10:'288',11:'318',12:'344'}
>     hour={4:'04',5:'05',6:'06',7:'07',8:'08',9:'09',10:'10',11:'11',12:'12',13:'13',14:'14',15:'15',16:'16',17:'17',18:'18',19:'19',20:'20',21:'21',22:'22'}
>     minute={0:'00',15:'15',30:'30',45:'45'}
>     directory='/home/user/Raster/'
>     tmp='/home/usr/tmp/'
>     for i in df.index:
>         if 4<=i.hour<22:
>             #try:
>         timeg=time.time()
>             os.system('gdal_calc.py -A '+directory+'filea_'+month[i.month]+'_'+hour[i.hour]+minute[i.minute]+' -B '+directory+'fileb_'+month[i.month]+'_'+hour[i.hour]+minute[i.minute]+' --outfile='+tmp+str(i.date())+'_'+str(i.time())+' --calc=A*'+str(df.ix[i,'k1'])+'+B*'+str(df.ix[i,'k2']))
>             print(i,"--- %s seconds ---" % (time.time() - timeg))
> If i run the code with out parallelization it takes around 650s to complete the calculation. Each process of the for loop is executed in ~10s. If i run with parallelization it takes ~900s to complete the procces and each process of the for loop it takes ~30s.
> 
> How is that? how can i Fix this?
> 
> Best L
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/gdal-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20160302/d5bf2bd2/attachment-0001.html>


More information about the gdal-dev mailing list