html tool

2018年5月7日星期一

“echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message."

参考:
http://blog.51cto.com/10983441/1782411
http://www.361way.com/kernel-hung-task-analysis/4326.html

echo 0 > /proc/sys/kernel/hung_task_timeout_secs disables this message 
1735 blocked for more than 120 seconds

问题原因: 

默认情况下, Linux会最多使用40%的可用内存作为文件系统缓存。当超过这个阈值后,文件系统会把将缓存中的内存全部写入磁盘, 导致后续的IO请求都是同步的。

将缓存写入磁盘时,有一个默认120秒的超时时间。 出现上面的问题的原因是IO子系统的处理速度不够快,不能在120秒将缓存中的数据全部写入磁盘。

IO系统响应缓慢,导致越来越多的请求堆积,最终系统内存全部被占用,导致系统失去响应




message log err

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
...
...
Apr 24 08:44:17 10-9-179-169 kernel: INFO: task kthreadd:2 blocked for more than 120 seconds.
Apr 24 08:44:17 10-9-179-169 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 24 08:44:17 10-9-179-169 kernel: kthreadd        D ffff8800366aeaf0     0     2      0 0x00000000
Apr 24 08:44:17 10-9-179-169 kernel: ffff88007c00f810 0000000000000046 ffff88007c798b80 ffff88007c00ffd8
Apr 24 08:44:17 10-9-179-169 kernel: ffff88007c00ffd8 ffff88007c00ffd8 ffff88007c798b80 ffff8800366aeae8
Apr 24 08:44:17 10-9-179-169 kernel: ffff8800366aeaec ffff88007c798b80 00000000ffffffff ffff8800366aeaf0
Apr 24 08:44:17 10-9-179-169 kernel: Call Trace:
Apr 24 08:44:17 10-9-179-169 kernel: [] schedule_preempt_disabled+0x29/0x70
Apr 24 08:44:17 10-9-179-169 kernel: [] __mutex_lock_slowpath+0xc5/0x1c0
Apr 24 08:44:17 10-9-179-169 kernel: [] mutex_lock+0x1f/0x2f
Apr 24 08:44:17 10-9-179-169 kernel: [] xfs_reclaim_inodes_ag+0x2dc/0x390 [xfs]
Apr 24 08:44:17 10-9-179-169 kernel: [] ? __wake_up+0x44/0x50
Apr 24 08:44:17 10-9-179-169 kernel: [] ? call_rcu_sched+0x1d/0x20
Apr 24 08:44:17 10-9-179-169 kernel: [] ? d_free+0x4d/0x70
Apr 24 08:44:17 10-9-179-169 kernel: [] ? shrink_dentry_list+0x250/0x480
Apr 24 08:44:17 10-9-179-169 kernel: [] xfs_reclaim_inodes_nr+0x33/0x40 [xfs]
Apr 24 08:44:17 10-9-179-169 kernel: [] xfs_fs_free_cached_objects+0x15/0x20 [xfs]
Apr 24 08:44:17 10-9-179-169 kernel: [] prune_super+0xe8/0x170
Apr 24 08:44:17 10-9-179-169 kernel: [] shrink_slab+0x165/0x300
Apr 24 08:44:17 10-9-179-169 kernel: [] ? vmpressure+0x21/0x90
Apr 24 08:44:17 10-9-179-169 kernel: [] do_try_to_free_pages+0x3c2/0x4e0
Apr 24 08:44:17 10-9-179-169 kernel: [] try_to_free_pages+0xfc/0x180
Apr 24 08:44:17 10-9-179-169 kernel: [] __alloc_pages_nodemask+0x808/0xba0
Apr 24 08:44:17 10-9-179-169 kernel: [] copy_process.part.25+0x163/0x1610
Apr 24 08:44:17 10-9-179-169 kernel: [] ? dequeue_task_fair+0x42e/0x640
Apr 24 08:44:17 10-9-179-169 kernel: [] ? kthread_create_on_node+0x140/0x140
Apr 24 08:44:17 10-9-179-169 kernel: [] do_fork+0xe1/0x320
Apr 24 08:44:17 10-9-179-169 kernel: [] kernel_thread+0x26/0x30
Apr 24 08:44:17 10-9-179-169 kernel: [] kthreadd+0x2b2/0x2f0
Apr 24 08:44:17 10-9-179-169 kernel: [] ? kthread_create_on_cpu+0x60/0x60
Apr 24 08:44:17 10-9-179-169 kernel: [] ret_from_fork+0x58/0x90
Apr 24 08:44:17 10-9-179-169 kernel: [] ? kthread_create_on_cpu+0x60/0x60
Apr 24 08:44:17 10-9-179-169 kernel: INFO: task kswapd0:27 blocked for more than 120 seconds.
Apr 24 08:44:17 10-9-179-169 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

...
过滤后的问题:
Apr 24 08:44:17 10-9-179-169 kernel: INFO: task kthreadd:2 blocked for more than 120 seconds.
Apr 24 08:44:17 10-9-179-169 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kthreadd进程由idle通过kernel_thread创建,并始终运行在内核空间, 负责所有内核线程的调度和管理 
参考:https://blog.csdn.net/gatieme/article/details/51566690
--
Apr 24 08:44:17 10-9-179-169 kernel: INFO: task kswapd0:27 blocked for more than 120 seconds.
Apr 24 08:44:17 10-9-179-169 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kswapd0是虚拟内存管理中负责换页的,当服务器内存不足的时候kswapd0会执行换页操作,这个换页操作是十分消耗主机CPU资源的
参考:https://blog.csdn.net/u012129607/article/details/74993302
--
Apr 24 08:44:17 10-9-179-169 kernel: INFO: task xfsaild/vdb:405 blocked for more than 120 seconds.
Apr 24 08:44:17 10-9-179-169 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfsaild: 磁盘操作进程-这里推断是vdb的硬盘操作
参考:http://www.dianyue.me/archives/302/ylep8un1wji2kc4a/
--


好吧,kthreadd都出问题了,还能指望使用系统吗?看来是交换分区的问题,内存不足的原因,至于是谁使用的内存过量,就要再查查了。

没有评论:

发表评论