[SAC] Re: [OSGeo] #574: OSGeo SVN server(s) unresponsive during certain hours of the day

Wed May 26 18:17:50 EDT 2010

#574: OSGeo SVN server(s) unresponsive during certain hours of the day
---------------------+------------------------------------------------------
  Reporter:  jng     |       Owner:  sac at lists.osgeo.org
      Type:  defect  |      Status:  new                
  Priority:  normal  |   Component:  Systems Admin      
Resolution:          |    Keywords:                     
---------------------+------------------------------------------------------
Comment (by hamish):

 Replying to [comment:1 crschmidt]:
 >  2. We don't know why.

 I don't like that situation, and so I wrote & have started running this
 script on xblade13 and 14 as a test. Will post a plot of the results after
 some time. If you think it is useful feel free to run it on the more
 strained servers.

 log_cpu.sh:
 {{{
 #!/bin/sh

 # script to log cpu use etc.

 # log every 5 minutes
 interval=300

 outfile=~/"cpu_use.`hostname`.log"

 #echo "Will consume about $((50 * 3600/300 * 24 / 1024)) kb/day"

 echo "#year/day hr:min TZ cpu_1min_avg cpu_5min_avg cpu_15min_avg cpu_hog
 hog_cpu% free_mem_mb" >> "$outfile"

 while [ 1 -eq 1 ] ; do
    unset TIMESTAMP CPU_USAGE CPU_HOG FREE_MEM
    TIMESTAMP=`date -u '+%Y/%j %k:%M UTC'`
    CPU_USAGE=`uptime | cut -f5 -d: | sed -e 's/,//g' -e 's/^ //'`
    CPU_HOG=`top -b -n 1 | sed -e '1,7d' | head -n 1 | awk '{print $12 " "
 $9}'`
    FREE_MEM=`free -m | grep 'buffers/cache' | awk '{print $4}'`
    sleep 1
    echo "$TIMESTAMP $CPU_USAGE $CPU_HOG $FREE_MEM" >> "$outfile"
    sleep `expr $interval - 1`
 done
 }}}

 example output from this morning's xblade13
 {{{
 #year/day hr:min TZ cpu_1min_avg cpu_5min_avg cpu_15min_avg cpu_hog
 hog_cpu% free_mem_mb
 2010/146 20:21 UTC 0.00 0.03 0.00 rhn-applet-gui 2.0 894
 2010/146 20:26 UTC 0.01 0.02 0.00 rhn-applet-gui 2.0 896
 2010/146 20:31 UTC 0.00 0.00 0.00 init 0.0 896
 2010/146 20:36 UTC 0.25 0.10 0.04 top 2.0 896
 2010/146 20:41 UTC 0.06 0.06 0.02 top 2.0 898
 2010/146 20:46 UTC  Xvnc 1.9 898
 2010/146 20:51 UTC  top 2.0 899
 2010/146 20:56 UTC  top 3.9 898
 2010/146 21:01 UTC  top 1.9 896
 2010/146 21:06 UTC  init 0.0 896
 2010/146 21:11 UTC  init 0.0 897
 2010/146 21:16 UTC  httpd 1.9 897
 2010/146 21:21 UTC  top 3.9 897
 2010/146 21:26 UTC  top 1.9 900
 2010/146 21:31 UTC  top 1.9 897
 2010/146 21:36 UTC  top 3.9 898
 2010/146 21:41 UTC  top 2.0 898
 2010/146 21:46 UTC 0.01 0.03 0.00 top 1.9 898
 2010/146 21:51 UTC 0.05 0.03 0.00 init 0.0 900
 2010/146 21:56 UTC 0.20 0.11 0.02 init 0.0 901
 2010/146 22:01 UTC 0.14 0.09 0.03 httpd 2.0 900
 2010/146 22:06 UTC 0.00 0.04 0.01 top 2.0 901
 2010/146 22:11 UTC 0.17 0.11 0.03 nscd 2.0 900
 }}}

 hmmmm, that's weird, some `uptime` parsing bug..? may have to replace `cut
 -f5` with `sed -e 's/.*average://'`.

 personally I just trained myself not to be on the computer from 7-10pm
 local time as a work around for this issue :)

 regards,
 Hamish

-- 
Ticket URL: <https://trac.osgeo.org/osgeo/ticket/574#comment:2>
OSGeo <http://www.osgeo.org/>
OSGeo committee and general foundation issue tracker.