[GRASS-user] v.net parallelisation issues
Mark Wynter
mark at dimensionaledge.com
Thu Feb 12 23:39:56 PST 2015
I’ve encountered a bottleneck somewhere with v.net when scaling out with GNU Parallel… not sure if its an underlying issue with v.net or the way I’m calling the batch jobs?
I’ve got 32 CPUs and commensurate RAM. What I’m observing is v.net CPU utilisation dropping off in accordance with number of jobs running.
I’ve tried launching a single batch job with single mapset, as well as multiple batch jobs each with their own mapset (and database). I’ve tried both PG and sqlite backends. Same issue.
The script at the bottom describes the approach of launching multiple batch jobs each with their own map set. Executing a single batch job, and then launching parallel within the batch script is much cleaner code - but the results are no different.
I feel I’m so close, yet so far at such a critical stage of project delivery.
Hope someone can help
Kind regards
Mark
RESULTS
ONE JOB
TOTAL SCRIPT TIME: 70
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31313 root 20 0 28876 4080 1284 S 76.5 0.0 0:20.25 sqlite
31293 root 20 0 276m 134m 8320 S 68.5 0.2 0:20.22 v.net.distance
—————————
TWO JOBS
TOTAL SCRIPT TIME: 96
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
21391 root 20 0 28876 4080 1284 R 53.0 0.0 0:01.90 sqlite
21392 root 20 0 28876 4080 1284 R 52.6 0.0 0:01.86 sqlite
21380 root 20 0 276m 128m 8320 R 49.3 0.2 0:04.02 v.net.distance
21381 root 20 0 276m 128m 8320 S 48.3 0.2 0:03.97 v.net.distance
—————————
FOUR JOBS
TOTAL SCRIPT TIME: 187
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6953 mark 20 0 180m 100m 9520 S 63.6 0.2 1:47.39 x2goagent
23025 root 20 0 28876 4080 1284 S 21.5 0.0 0:02.03 sqlite
23026 root 20 0 28876 4080 1284 R 19.9 0.0 0:02.08 sqlite
23027 root 20 0 28876 4080 1284 S 19.5 0.0 0:01.87 sqlite
23028 root 20 0 28876 4080 1284 S 19.5 0.0 0:01.84 sqlite
23014 root 20 0 276m 128m 8320 R 18.5 0.2 0:04.06 v.net.distance
23012 root 20 0 276m 128m 8320 R 17.5 0.2 0:03.91 v.net.distance
23011 root 20 0 276m 128m 8320 S 16.9 0.2 0:04.13 v.net.distance
23015 root 20 0 276m 128m 8320 R 16.9 0.2 0:03.80 v.net.distance
—————————
EIGHT JOBS
TOTAL SCRIPT TIME: 373
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27157 root 20 0 28876 4088 1284 S 19.5 0.0 0:42.39 sqlite
27162 root 20 0 28876 4088 1284 R 16.9 0.0 0:40.60 sqlite
6953 mark 20 0 181m 101m 9520 S 16.5 0.2 2:18.86 x2goagent
27154 root 20 0 28876 4088 1284 S 16.5 0.0 0:39.38 sqlite
27153 root 20 0 28876 4088 1284 S 16.2 0.0 0:35.60 sqlite
27156 root 20 0 28876 4088 1284 R 16.2 0.0 0:38.18 sqlite
27161 root 20 0 28876 4088 1284 S 15.9 0.0 0:40.96 sqlite
27155 root 20 0 28876 4088 1284 S 15.6 0.0 0:38.41 sqlite
27104 root 20 0 284m 139m 8332 S 14.9 0.2 0:39.94 v.net.distance
27158 root 20 0 28876 4088 1284 R 14.6 0.0 0:37.49 sqlite
27095 root 20 0 284m 138m 8332 S 14.2 0.2 0:34.48 v.net.distance
27099 root 20 0 284m 138m 8332 S 14.2 0.2 0:38.27 v.net.distance
27101 root 20 0 284m 139m 8332 R 14.2 0.2 0:38.80 v.net.distance
27105 root 20 0 284m 139m 8332 R 14.2 0.2 0:37.95 v.net.distance
27093 root 20 0 284m 138m 8332 R 13.9 0.2 0:32.64 v.net.distance
27102 root 20 0 284m 140m 8332 R 13.6 0.2 0:40.90 v.net.distance
27094 root 20 0 284m 138m 8332 R 13.2 0.2 0:35.78 v.net.distance
—————————
################################################
############ WORKER FUNCTION #############
################################################
# CREATE MAPSETS AND BASH SCRIPTS FOR EACH CPU
fn_worker (){
#######################
# copy mapset
#######################
cp -R /var/tmp/jtw/PERMANENT /var/tmp/jtw/batch_"$1"
#######################
# generate batch_job file
#######################
echo -e '#!/bin/bash
dbsettings="/mnt/data/common/repos/cf_private/settings/current.sh"
source $dbsettings
cpu='$1'
jid=`psql -d $dbname -U $username -A -t -c "SELECT min(jid) FROM jtw.nsw_tz_joblist WHERE processed = false and cpu = '$1';"`
o_tz11=`psql -d $dbname -U $username -A -t -c "SELECT o_tz11 FROM jtw.nsw_tz_joblist WHERE jid = $jid;"`
o_cat=`psql -d $dbname -U $username -A -t -c "SELECT o_tz11 FROM jtw.nsw_tz_joblist WHERE jid = $jid;"`
d_cat=`psql -d $dbname -U $username -A -t -c "SELECT d_tz11 FROM jtw.nsw_tz_joblist WHERE jid = $jid;"`
layername="temp_"$jid
v.net.distance --overwrite in=nsw_road_network_final_connected at batch_'$1' out=$layername from_layer=2 to_layer=2 from_cats=$d_cat to_cats=$o_cat arc_column=fwdcost arc_backward_column=bwdcost
v.out.ogr --o input=$layername output=/var/tmp/$layername type=line
ogr2ogr -overwrite -f "PostgreSQL" PG:"host=localhost dbname=o$dbname user=$username password=$password" /var/tmp/$layername/$layername.shp -nln jtw.$layername -s_srs EPSG:3577 -t_srs EPSG:3577 -a_srs EPSG:3577 -nlt LINESTRING
psql -d $dbname -U $username -c "INSERT INTO jtw.nsw_tz_journey_paths
With s AS (SELECT a.cat, a.tcat, b.tz_code11 as o_tz11, c.tz_code11 as d_tz11, d.lid, d.wkb_geometry, e.employed_persons FROM jtw.$layername a, grass.nsw_tz_centroids_nodes b, grass.nsw_tz_centroids_nodes c, jtw.nsw_road_network_final_net_att d, jtw.nsw_tz_volumes e WHERE a.tcat = b.cat AND a.cat = c.cat AND ST_Equals(a.wkb_geometry, d.wkb_geometry) AND d.type <> '\'service_line\'' AND b.tz_code11 = e.o_tz11 AND c.tz_code11 = e.d_tz11 AND e.mode9 = 4) SELECT NEXTVAL('\'jtw.nsw_tz_journey_paths_jid_seq\''), o_tz11, d_tz11, lid, wkb_geometry, employed_persons FROM s; UPDATE jtw.nsw_tz_joblist SET processed = true WHERE jid = $jid;"
#end of job file' > /var/tmp/jtw/jobs/batch_$1.sh
#######################
chmod u+x /var/tmp/jtw/jobs/batch_$1.sh
}
export -f fn_worker
# remove previous mapsets before writing new files
rm -rf /var/tmp/jtw/batch*
rm -rf /var/tmp/jtw/jobs/batch*
#execute in parallel
seq 1 4 | parallel fn_worker {1}
wait
#######################
################################################
####### JOB SCHEDULER ########
################################################
#\\\\\\\\\\\\\\\\\\\\\\\\\
START_T1=$(date +%s)
#\\\\\\\\\\\\\\\\\\\\\\\\\
fn_worker (){
export GRASS_BATCH_JOB=/var/tmp/jtw/jobs/batch_$1.sh
grass70 /var/tmp/jtw/batch_$1
unset GRASS_BATCH_JOB
}
export -f fn_worker
seq 1 4 | parallel fn_worker {1}
wait
#\\\\\\\\\\\\\\\\\\\\\\\\\
END_T1=$(date +%s)
#\\\\\\\\\\\\\\\\\\\\\\\\\
TOTAL_DIFF=$(( $END_T1 - $START_T1 ))
echo "TOTAL SCRIPT TIME: $TOTAL_DIFF"
#\\\\\\\\\\\\\\\\\\\\\\\\\
################################################
>>
>> The slow rate of writing out the v.net.allpair results from
>> PostgreSQL was due to the sheer volume of line strings, as the number
>> of pairs increased (n^2). Simple math said stop. I?ve since
>> changed my approach and am using v.net.distance in a novel way where
>> the to_cat is the origin, and the from_cat is a string of
>> destinations - this is an equivalent way of generating multiple
>> v.net.paths in a single operation. Moreover, I?m feeding each origin
>> - destination collection into GNU Parallel as a separate job, so it
>> rips through the data at scale!
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/grass-user/attachments/20150213/fee55587/attachment-0001.html>
More information about the grass-user
mailing list