[GRASS-user] v.net parallelisation issues

Mark Wynter mark at dimensionaledge.com
Thu Feb 12 23:39:56 PST 2015


I’ve encountered a bottleneck somewhere with v.net when scaling out with GNU Parallel… not sure if its an underlying issue with v.net or the way I’m calling the batch jobs?

I’ve got 32 CPUs and commensurate RAM.   What I’m observing is v.net CPU utilisation dropping off in accordance with number of jobs running.

I’ve tried launching a single batch job with single mapset, as well as multiple batch jobs each with their own mapset (and database).  I’ve tried both PG and sqlite backends.   Same issue.

The script at the bottom describes the approach of launching multiple batch jobs each with their own map set.    Executing a single batch job, and then launching parallel within the batch script is much cleaner code - but the results are no different.

I feel I’m so close, yet so far at such a critical stage of project delivery.

Hope someone can help

Kind regards
Mark

RESULTS

ONE JOB
TOTAL SCRIPT TIME: 70

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                       
31313 root      20   0 28876 4080 1284 S 76.5  0.0   0:20.25 sqlite                                         
31293 root      20   0  276m 134m 8320 S 68.5  0.2   0:20.22 v.net.distance     
—————————

TWO JOBS
TOTAL SCRIPT TIME: 96

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                       
21391 root      20   0 28876 4080 1284 R 53.0  0.0   0:01.90 sqlite                                         
21392 root      20   0 28876 4080 1284 R 52.6  0.0   0:01.86 sqlite                                         
21380 root      20   0  276m 128m 8320 R 49.3  0.2   0:04.02 v.net.distance                                 
21381 root      20   0  276m 128m 8320 S 48.3  0.2   0:03.97 v.net.distance
—————————

FOUR JOBS
TOTAL SCRIPT TIME: 187

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                       
 6953 mark      20   0  180m 100m 9520 S 63.6  0.2   1:47.39 x2goagent                                      
23025 root      20   0 28876 4080 1284 S 21.5  0.0   0:02.03 sqlite                                         
23026 root      20   0 28876 4080 1284 R 19.9  0.0   0:02.08 sqlite                                         
23027 root      20   0 28876 4080 1284 S 19.5  0.0   0:01.87 sqlite                                         
23028 root      20   0 28876 4080 1284 S 19.5  0.0   0:01.84 sqlite                                         
23014 root      20   0  276m 128m 8320 R 18.5  0.2   0:04.06 v.net.distance                                 
23012 root      20   0  276m 128m 8320 R 17.5  0.2   0:03.91 v.net.distance                                 
23011 root      20   0  276m 128m 8320 S 16.9  0.2   0:04.13 v.net.distance                                 
23015 root      20   0  276m 128m 8320 R 16.9  0.2   0:03.80 v.net.distance  
—————————

EIGHT JOBS
TOTAL SCRIPT TIME: 373

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                       
27157 root      20   0 28876 4088 1284 S 19.5  0.0   0:42.39 sqlite                                         
27162 root      20   0 28876 4088 1284 R 16.9  0.0   0:40.60 sqlite                                         
 6953 mark      20   0  181m 101m 9520 S 16.5  0.2   2:18.86 x2goagent                                      
27154 root      20   0 28876 4088 1284 S 16.5  0.0   0:39.38 sqlite                                         
27153 root      20   0 28876 4088 1284 S 16.2  0.0   0:35.60 sqlite                                         
27156 root      20   0 28876 4088 1284 R 16.2  0.0   0:38.18 sqlite                                         
27161 root      20   0 28876 4088 1284 S 15.9  0.0   0:40.96 sqlite                                         
27155 root      20   0 28876 4088 1284 S 15.6  0.0   0:38.41 sqlite                                         
27104 root      20   0  284m 139m 8332 S 14.9  0.2   0:39.94 v.net.distance                                 
27158 root      20   0 28876 4088 1284 R 14.6  0.0   0:37.49 sqlite                                         
27095 root      20   0  284m 138m 8332 S 14.2  0.2   0:34.48 v.net.distance                                 
27099 root      20   0  284m 138m 8332 S 14.2  0.2   0:38.27 v.net.distance                                 
27101 root      20   0  284m 139m 8332 R 14.2  0.2   0:38.80 v.net.distance                                 
27105 root      20   0  284m 139m 8332 R 14.2  0.2   0:37.95 v.net.distance                                 
27093 root      20   0  284m 138m 8332 R 13.9  0.2   0:32.64 v.net.distance                                 
27102 root      20   0  284m 140m 8332 R 13.6  0.2   0:40.90 v.net.distance                                 
27094 root      20   0  284m 138m 8332 R 13.2  0.2   0:35.78 v.net.distance  
—————————

################################################
############   WORKER FUNCTION    #############  
################################################
# CREATE MAPSETS AND BASH SCRIPTS FOR EACH CPU
fn_worker (){

#######################
# copy mapset
#######################
cp -R /var/tmp/jtw/PERMANENT /var/tmp/jtw/batch_"$1"

#######################
# generate batch_job file
#######################
echo -e '#!/bin/bash
dbsettings="/mnt/data/common/repos/cf_private/settings/current.sh"
source $dbsettings
cpu='$1'

jid=`psql -d $dbname -U $username -A -t -c "SELECT min(jid) FROM jtw.nsw_tz_joblist WHERE processed = false and cpu = '$1';"`
o_tz11=`psql -d $dbname -U $username -A -t -c "SELECT o_tz11 FROM jtw.nsw_tz_joblist WHERE jid = $jid;"`
o_cat=`psql -d $dbname -U $username -A -t -c "SELECT o_tz11 FROM jtw.nsw_tz_joblist WHERE jid = $jid;"`
d_cat=`psql -d $dbname -U $username -A -t -c "SELECT d_tz11 FROM jtw.nsw_tz_joblist WHERE jid = $jid;"`
layername="temp_"$jid

v.net.distance --overwrite in=nsw_road_network_final_connected at batch_'$1' out=$layername from_layer=2 to_layer=2 from_cats=$d_cat to_cats=$o_cat arc_column=fwdcost arc_backward_column=bwdcost

v.out.ogr --o input=$layername output=/var/tmp/$layername type=line

ogr2ogr -overwrite -f "PostgreSQL" PG:"host=localhost dbname=o$dbname user=$username password=$password" /var/tmp/$layername/$layername.shp -nln jtw.$layername -s_srs EPSG:3577 -t_srs EPSG:3577 -a_srs EPSG:3577 -nlt LINESTRING

psql -d $dbname -U $username -c "INSERT INTO jtw.nsw_tz_journey_paths
With s AS (SELECT a.cat, a.tcat, b.tz_code11 as o_tz11, c.tz_code11 as d_tz11, d.lid, d.wkb_geometry, e.employed_persons FROM jtw.$layername a, grass.nsw_tz_centroids_nodes b,  grass.nsw_tz_centroids_nodes c,  jtw.nsw_road_network_final_net_att d, jtw.nsw_tz_volumes e WHERE a.tcat = b.cat AND a.cat = c.cat AND ST_Equals(a.wkb_geometry, d.wkb_geometry) AND d.type <> '\'service_line\'' AND b.tz_code11 = e.o_tz11 AND c.tz_code11 = e.d_tz11 AND e.mode9 = 4) SELECT NEXTVAL('\'jtw.nsw_tz_journey_paths_jid_seq\''), o_tz11, d_tz11, lid, wkb_geometry, employed_persons FROM s; UPDATE jtw.nsw_tz_joblist SET processed = true WHERE jid = $jid;"

#end of job file' > /var/tmp/jtw/jobs/batch_$1.sh
#######################

chmod u+x /var/tmp/jtw/jobs/batch_$1.sh
}
export -f fn_worker

# remove previous mapsets before writing new files
rm -rf /var/tmp/jtw/batch*
rm -rf /var/tmp/jtw/jobs/batch*

#execute in parallel
seq 1 4 | parallel fn_worker {1}
wait
#######################


################################################
#######   JOB SCHEDULER    ########  
################################################

#\\\\\\\\\\\\\\\\\\\\\\\\\
START_T1=$(date +%s)
#\\\\\\\\\\\\\\\\\\\\\\\\\

fn_worker (){
export GRASS_BATCH_JOB=/var/tmp/jtw/jobs/batch_$1.sh
grass70 /var/tmp/jtw/batch_$1
unset GRASS_BATCH_JOB
}
export -f fn_worker

seq 1 4 | parallel fn_worker {1}
wait

#\\\\\\\\\\\\\\\\\\\\\\\\\
END_T1=$(date +%s)
#\\\\\\\\\\\\\\\\\\\\\\\\\
TOTAL_DIFF=$(( $END_T1 - $START_T1 ))
echo "TOTAL SCRIPT TIME: $TOTAL_DIFF"
#\\\\\\\\\\\\\\\\\\\\\\\\\

################################################


>> 
>> The slow rate of writing out the v.net.allpair results from
>> PostgreSQL was due to the sheer volume of line strings, as the number
>> of pairs increased (n^2).  Simple math said stop.   I?ve since
>> changed my approach and am using v.net.distance in a novel way where
>> the to_cat is the origin, and the from_cat is a string of
>> destinations - this is an equivalent way of generating multiple
>> v.net.paths in a single operation.  Moreover, I?m feeding each origin
>> - destination collection into GNU Parallel as a separate job, so it
>> rips through the data at scale!
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20150213/fee55587/attachment-0001.html>


More information about the grass-user mailing list