进程上下文切换

问题

今天发现测试服务器的ssh登录很卡,几乎很难登录进去,即便登录进去了,看日志的时候也会卡死终端。

查找原因

使用ssh -v 参数查看,一直卡在sent地方

$ ssh -v test
OpenSSH_6.9p1 Ubuntu-2, OpenSSL 1.0.2d 9 Jul 2015  
debug1: Reading configuration data /home/ch/.ssh/config  
debug1: /home/ch/.ssh/config line 1: Applying options for test  
debug1: Reading configuration data /etc/ssh/ssh_config  
debug1: /etc/ssh/ssh_config line 19: Applying options for *  
debug1: Connecting to *.*.*.* [*.*.*.*] port 22.  
debug1: Connection established.  
debug1: key_load_public: No such file or directory  
debug1: identity file /home/ch/.ssh/id_rsa_test type -1  
debug1: key_load_public: No such file or directory  
debug1: identity file /home/ch/.ssh/id_rsa_test-cert type -1  
debug1: Enabling compatibility mode for protocol 2.0  
debug1: Local version string SSH-2.0-OpenSSH_6.9p1 Ubuntu-2  
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3  
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 pat OpenSSH_6.6.1* compat 0x04000000  
debug1: Authenticating to 120.24.180.89:22 as 'root'  
debug1: SSH2_MSG_KEXINIT sent  

top -o %CPU (有版本的top是没有o参数的)了一下,发现php的进程占用了99.9%的CPU。好不容易敲了reboot执行成功了,但是重启过后的,仅仅是稍微好了一些而已。

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----  
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 2  0      0  79320  32996 393880    0    0     0    68 4855 18998  0  1 99  0  0
 1  0      0  79296  32996 393888    0    0     0     0 4864 19101  0  0 99  0  1
 1  0      0  79236  32996 393888    0    0     0     0 4849 19057  1  0 99  0  0
 1  0      0  79208  32996 393888    0    0     0     0 4863 19181  0  0 100  0  0
 1  0      0  79180  32996 393888    0    0     0     0 4809 18997  1  0 97  0  1
 1  0      0  79152  32996 393888    0    0     0     0 4814 18920  3  0 97  0  0

可以看到cs过高,这东西就是上下文切换。说明上下文切换(cs)特别频繁

$ pidstat -w 1
Average:      UID       PID   cswch/s nvcswch/s  Command  
Average:        0         3     18.81      0.00  ksoftirqd/0  
Average:        0         6      0.20      0.00  kworker/u16:0  
Average:        0         7     39.47      0.00  rcu_sched  
Average:        0        10      0.41      0.00  watchdog/0  
Average:        0        16    181.80      0.00  kworker/0:1  
Average:        0       125      0.61      0.00  jbd2/xvda1-8  
Average:        0       836      0.20      0.00  kworker/u17:1  
Average:        0       843      1.64      0.00  php5-fpm  
Average:    65534      1012      0.20      0.00  nscd  
Average:      106      1037   6578.53      0.00  redis-server  
Average:        0      1085      0.20      0.00  AliYunDunUpdate  
Average:        0      1104     28.63      0.00  AliYunDun  
Average:        0      1149      1.43      0.00  apache2  
Average:        0      1163     28.63      0.00  AliHids  
Average:      103      1479      1.43      0.00  ntpd  
Average:        0      1894      7.16      0.00  PM2  
Average:        0      1965    142.33      0.00  mongod  
Average:        0      2006      2.86      0.00  node  
Average:        0      6815      2.86      0.00  node  
Average:        0      7068   6682.21   6474.64  php  
Average:        0      7775      2.86      0.00  node  
Average:        0      7780      2.86      0.00  node  
Average:        0      7785      2.86      0.00  node  
Average:       33      8127      2.86      0.00  nginx  
Average:       33      8128      2.86      0.00  nginx  
Average:       33      8129      2.86      0.00  nginx  
Average:        0     14255    180.16      5.93  sshd  
Average:        0     14545      1.43    195.09  pidstat  

结果可以看出php进程的cswch(自愿的上下文切换)和nvcswch(非自愿的上下文切换)

暴力解决

直接kill掉,就正常了。重新手动启动php。

实际原因

不停地测试不同php进程,终于找到是这个死循环造成的,usleep时间太短,上下文切换频繁。

参考

用十条命令在一分钟内检查Linux服务器性能

进程上下文频繁切换导致load average过高

进程上下文切换 – 残酷的性能杀手(上) Linux vmstat命令实战详解