[SAC] xblade14 distress

Markus Neteler neteler at osgeo.org
Mon Jan 11 16:42:54 EST 2010


Hi again,

the blade granted cmd line to me :)

Top is showing high load average but almost no CPU usage:

top - 13:35:28 up  6:23,  2 users,  load average: 62.46, 75.51, 49.82
Tasks: 229 total,   5 running, 224 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.6% us,  3.6% sy,  0.0% ni,  0.0% id, 93.1% wa,  0.7% hi,  0.0% si
Mem:   1034320k total,  1022192k used,    12128k free,     5824k buffers
Swap:  2096472k total,  1072120k used,  1024352k free,    40772k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10685 apache    16   0 40556 8240 2784 D  1.7  0.8   0:00.05 httpd
10592 nobody    16   0  5096 1516  824 D  1.0  0.1   0:00.20 rsync
 4530 root      16   0  2104  964  676 R  0.3  0.1   0:31.00 top
28909 apache    16   0 57272  18m 5780 D  0.3  1.8   1:11.57 httpd
 8854 apache    16   0 55028  20m 3776 R  0.3  2.0   0:03.47 httpd
 8938 apache    16   0 52212  12m 3880 D  0.3  1.3   0:01.36 httpd
 8985 apache    16   0 52232  12m 4124 D  0.3  1.3   0:01.37 httpd
 9739 neteler   16   0  2108  968  680 R  0.3  0.1   0:03.25 top
10305 apache    15   0 45132  13m 3980 S  0.3  1.4   0:00.68 httpd
10336 apache    15   0 41000 9.9m 4052 D  0.3  1.0   0:00.40 httpd
    1 root      15   0  1744  444  420 S  0.0  0.0   0:00.67 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.06 ksoftirqd/0
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 events/0
    5 root      19  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
    6 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
   10 root      10  -5     0    0    0 S  0.0  0.0   0:01.28 kblockd/0
   11 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
  168 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 khubd
...

Half of swap is used which is no good.


Searching for dead processes:

top -b -n 1 | awk '{if (NR <=7) print; else if ($8 == "D") {print;
count++} } END {print "Total status D: "count}'
top - 13:37:10 up  6:25,  2 users,  load average: 83.92, 78.91, 53.75
...
10774 root      17   0  6928 5224 1644 D  5.8  0.5   0:00.25 mrtg
10305 apache    16   0 45132  13m 3980 D  1.9  1.4   0:00.87 httpd
  400 root      10  -5     0    0    0 D  0.0  0.0   0:02.28 kjournald
  898 root      10  -5     0    0    0 D  0.0  0.0   0:01.26 kjournald
 1223 root      15   0  1612  540  492 D  0.0  0.1   0:00.11 syslogd
 1462 root      16   0  1824  576  532 D  0.0  0.1   0:00.19 automount
 3295 apache    16   0 63916  12m 6156 D  0.0  1.2   3:06.06 httpd
26976 apache    16   0 63140  18m 6156 D  0.0  1.8   1:29.20 httpd
28446 apache    16   0 63236  15m 5808 D  0.0  1.6   1:09.26 httpd
28906 apache    16   0 57188  12m 5776 D  0.0  1.3   1:16.37 httpd
...
10816 apache    16   0 39080 7184 3148 D  0.0  0.7   0:00.03 httpd
10820 apache    16   0 38868 5808 2248 D  0.0  0.6   0:00.04 httpd
10885 apache    15   0 38868 4828 1516 D  0.0  0.5   0:00.00 httpd
10894 neteler   16   0  1572  420  364 D  0.0  0.0   0:00.01 md5sum
10896 neteler   16   0   228    8    0 D  0.0  0.0   0:00.00 cut
Total status D: 80

Way too many dead jobs.
Sounds like reboot again?

Markus


More information about the Sac mailing list