早上一到,发现oracle连不上。
到主机上,发现只有oracleora11g一个进程,其他进程全没了。
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: INFO: task sadc:14833 blocked for more than 120 seconds.
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: sadc D 0000000000000000 0 14833 14832 0x00000084
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: ffff88061533bdc8 0000000000000086 0000000000000000 ffff88061533bde8
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: ffff88061533bd88 ffffffff8111f3e0 ffff880528dab9d0 ffff88061533bde8
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: ffff880614125af8 ffff88061533bfd8 000000000000fbc8 ffff880614125af8
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: Call Trace:
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff8111f3e0>] ? find_get_pages_tag+0x40/0x130
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffffa02b65a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff81134c91>] ? do_writepages+0x21/0x40
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffffa02b6938>] jbd2_complete_transaction+0x68/0xb0 [jbd2]
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffffa02d2231>] ext4_sync_file+0x121/0x1d0 [ext4]
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff811baa61>] vfs_fsync_range+0xa1/0x100
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff811bab2d>] vfs_fsync+0x1d/0x20
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff811bab6e>] do_fsync+0x3e/0x60
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff811baba3>] sys_fdatasync+0x13/0x20
Nov 14 23:33:30 hs-test-10-20-30-15 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: INFO: task NetworkManager:2081 blocked for more than 120 seconds.
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: NetworkManage D 0000000000000001 0 2081 1 0x00000080
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: ffff880614185dc8 0000000000000082 0000000000000000 ffff880613b13e80
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: 0000000000000000 ffff880612e5e0d0 0000000000000000 0000000000000000
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: ffff88061464bab8 ffff880614185fd8 000000000000fbc8 ffff88061464bab8
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: Call Trace:
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffffa02b65a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff81134c91>] ? do_writepages+0x21/0x40
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffffa02b6938>] jbd2_complete_transaction+0x68/0xb0 [jbd2]
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffffa02d2231>] ext4_sync_file+0x121/0x1d0 [ext4]
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff811baa61>] vfs_fsync_range+0xa1/0x100
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff811bab2d>] vfs_fsync+0x1d/0x20
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff811bab6e>] do_fsync+0x3e/0x60
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff811babc0>] sys_fsync+0x10/0x20
Nov 15 00:01:29 hs-test-10-20-30-15 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: INFO: task NetworkManager:2081 blocked for more than 120 seconds.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: NetworkManage D 0000000000000001 0 2081 1 0x00000080
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff880614185dc8 0000000000000082 0000000000000000 ffff880613b13e80
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: 0000000000000000 ffff880612e5e0d0 0000000000000000 0000000000000000
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88061464bab8 ffff880614185fd8 000000000000fbc8 ffff88061464bab8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Call Trace:
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffffa02b65a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff81134c91>] ? do_writepages+0x21/0x40
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffffa02b6938>] jbd2_complete_transaction+0x68/0xb0 [jbd2]
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffffa02d2231>] ext4_sync_file+0x121/0x1d0 [ext4]
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff811baa61>] vfs_fsync_range+0xa1/0x100
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff811bab2d>] vfs_fsync+0x1d/0x20
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff811bab6e>] do_fsync+0x3e/0x60
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff811babc0>] sys_fsync+0x10/0x20
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: INFO: task sadc:15210 blocked for more than 120 seconds.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Not tainted 2.6.32-431.el6.x86_64 #1
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: sadc D 0000000000000000 0 15210 15209 0x00000084
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88091ed9bdc8 0000000000000082 0000000000000000 ffff88091ed9bde8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88091ed9bd88 ffffffff8111f3e0 ffff88008f60a9d0 ffff88091ed9bde8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: ffff88061439bab8 ffff88091ed9bfd8 000000000000fbc8 ffff88061439bab8
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: Call Trace:
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff8111f3e0>] ? find_get_pages_tag+0x40/0x130
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffffa02b65a5>] jbd2_log_wait_commit+0xc5/0x140 [jbd2]
Nov 15 00:03:29 hs-test-10-20-30-15 kernel: [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40
原因以及排查思路:
Under heavy IO load on servers you may see something like:
INFO: task nfsd:2252 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
...probably followed by a call trace that mentions your filesystem, and probably io_schedule and sync_buffer.
This message is not an error.
It is an indication that a program has had to wait for a very long time, and what it was doing. (which is not so informative of the reason - it‘s common that the real IO load issue comes from another process)
The code behind this sits in hung_task.c and was added somewhere around 2.6.30. This is a kernel thread that detects tasks that stays in the D state for a while (which typically meaning it is waiting for IO).
It complains when it sees a process has been waiting on IO so long that the whole process has not been scheduled for any CPU-time for 120 seconds (default).
Notes:
- if it happens constantly your IO system is slower than your IO use
- most likely to happen to a process that was ioniced into the idle class. Which means it‘s working, idle-class is meant as an extreme politeness thing. It just indicates something else is doing a bunch of IO right now (for at least 120 seconds)
- e.g. updatedb (may be victim if it were ioniced, cause if not)
- if it happens only nightly, look at your cron jobs
- a trashing system can cause this, and then it‘s purely a side effect of one program using too much RAM
- being blocked by a desktop-class drive with bad sectors (because they retry for a long while)
- NFS seems to be a common culprit, probably because it‘s good at filling the writeback cache, something which implies blocking while writeback happens - which is likely to block various things related to the same filesystem. (verify)
- if it happens on a fileserver, you may want to consider spreading to more fileservers, or using a parallel filesystem
- tweaking the linux io scheduler for the device may help (See Computer_data_storage_-_General_&_RAID_performance_tweaking#OS_scheduling)
- if your load is fairly sequential, you may get some relief from using the noop io scheduler (instead of cfq) though note that that disables ionice)
- if your load is relatively random, upping the queue depth may help
原文地址:https://www.cnblogs.com/zhjh256/p/9961414.html