进程上下文切换
问题
今天发现测试服务器的ssh登录很卡,几乎很难登录进去,即便登录进去了,看日志的时候也会卡死终端。
查找原因
使用ssh -v 参数查看,一直卡在sent地方
$ ssh -v test
OpenSSH_6.9p1 Ubuntu-2, OpenSSL 1.0.2d 9 Jul 2015
debug1: Reading configuration data /home/ch/.ssh/config
debug1: /home/ch/.ssh/config line 1: Applying options for test
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to *.*.*.* [*.*.*.*] port 22.
debug1: Connection established.
debug1: key_load_public: No such file or directory
debug1: identity file /home/ch/.ssh/id_rsa_test type -1
debug1: key_load_public: No such file or directory
debug1: identity file /home/ch/.ssh/id_rsa_test-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_6.9p1 Ubuntu-2
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 pat OpenSSH_6.6.1* compat 0x04000000
debug1: Authenticating to 120.24.180.89:22 as 'root'
debug1: SSH2_MSG_KEXINIT sent
top -o %CPU (有版本的top是没有o参数的)了一下,发现php的进程占用了99.9%的CPU。好不容易敲了reboot执行成功了,但是重启过后的,仅仅是稍微好了一些而已。
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 79320 32996 393880 0 0 0 68 4855 18998 0 1 99 0 0
1 0 0 79296 32996 393888 0 0 0 0 4864 19101 0 0 99 0 1
1 0 0 79236 32996 393888 0 0 0 0 4849 19057 1 0 99 0 0
1 0 0 79208 32996 393888 0 0 0 0 4863 19181 0 0 100 0 0
1 0 0 79180 32996 393888 0 0 0 0 4809 18997 1 0 97 0 1
1 0 0 79152 32996 393888 0 0 0 0 4814 18920 3 0 97 0 0
可以看到cs过高,这东西就是上下文切换。说明上下文切换(cs)特别频繁
$ pidstat -w 1
Average: UID PID cswch/s nvcswch/s Command
Average: 0 3 18.81 0.00 ksoftirqd/0
Average: 0 6 0.20 0.00 kworker/u16:0
Average: 0 7 39.47 0.00 rcu_sched
Average: 0 10 0.41 0.00 watchdog/0
Average: 0 16 181.80 0.00 kworker/0:1
Average: 0 125 0.61 0.00 jbd2/xvda1-8
Average: 0 836 0.20 0.00 kworker/u17:1
Average: 0 843 1.64 0.00 php5-fpm
Average: 65534 1012 0.20 0.00 nscd
Average: 106 1037 6578.53 0.00 redis-server
Average: 0 1085 0.20 0.00 AliYunDunUpdate
Average: 0 1104 28.63 0.00 AliYunDun
Average: 0 1149 1.43 0.00 apache2
Average: 0 1163 28.63 0.00 AliHids
Average: 103 1479 1.43 0.00 ntpd
Average: 0 1894 7.16 0.00 PM2
Average: 0 1965 142.33 0.00 mongod
Average: 0 2006 2.86 0.00 node
Average: 0 6815 2.86 0.00 node
Average: 0 7068 6682.21 6474.64 php
Average: 0 7775 2.86 0.00 node
Average: 0 7780 2.86 0.00 node
Average: 0 7785 2.86 0.00 node
Average: 33 8127 2.86 0.00 nginx
Average: 33 8128 2.86 0.00 nginx
Average: 33 8129 2.86 0.00 nginx
Average: 0 14255 180.16 5.93 sshd
Average: 0 14545 1.43 195.09 pidstat
结果可以看出php进程的cswch(自愿的上下文切换)和nvcswch(非自愿的上下文切换)
暴力解决
直接kill掉,就正常了。重新手动启动php。
实际原因
不停地测试不同php进程,终于找到是这个死循环造成的,usleep时间太短,上下文切换频繁。
参考
用十条命令在一分钟内检查Linux服务器性能
进程上下文频繁切换导致load average过高
进程上下文切换 – 残酷的性能杀手(上) Linux vmstat命令实战详解