centos 7.0运行docker出现内存报错解决方法

目前我这里docker是运行在centos 7.0系统里,使用1.5版本docker,最近一台服务器总是不定期死机,通过查看日志发现属于内核bug导致,报错信息如下

May 11 03:43:08 ip-10-10-29-201 kernel: BUG: soft lockup - CPU#4 stuck for 22s! [handler20:1542]
May 11 03:43:08 ip-10-10-29-201 kernel: Modules linked in: iptable_nat nf_nat_ipv4 iptable_filter ip_tables binfmt_misc ipmi_si vfat fat usb_storage mpt3sas mpt2sas raid_
class scsi_transport_sas mptctl mptbase dell_rbu tcp_diag inet_diag veth bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop dm_mod openvswitch vxl
an ip_tunnel gre libcrc32c xt_nat ipt_MASQUERADE xt_addrtype nf_nat xt_limit ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_conntrack sg nf_conntrack ipmi_de
vintf iTCO_wdt iTCO_vendor_support dcdbas coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_
helper cryptd pcspkr sb_edac edac_core ses enclosure ipmi_msghandler tg3 wmi acpi_power_meter ptp pps_core mei_me mei ntb lpc_ich mperf mfd_core shpchp ext4
May 11 03:43:08 ip-10-10-29-201 kernel: mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper
 ttm ahci drm libahci libata i2c_core megaraid_sas [last unloaded: ip_tables]
May 11 03:43:08 ip-10-10-29-201 kernel: CPU: 4 PID: 1542 Comm: handler20 Tainted: G        W   --------------   3.10.0-123.el7.x86_64 #1
May 11 03:43:08 ip-10-10-29-201 kernel: Hardware name: Dell Inc. PowerEdge R720/0X6FFV, BIOS 1.6.0 03/07/2013
May 11 03:43:08 ip-10-10-29-201 kernel: task: ffff880418adf1c0 ti: ffff8800c8d08000 task.ti: ffff8800c8d08000
May 11 03:43:08 ip-10-10-29-201 kernel: RIP: 0010:[<ffffffff815e90e7>]  [<ffffffff815e90e7>] _raw_spin_lock+0x37/0x50
May 11 03:43:08 ip-10-10-29-201 kernel: RSP: 0018:ffff88041fc43ac8  EFLAGS: 00000206
May 11 03:43:08 ip-10-10-29-201 kernel: RAX: 000000000000108b RBX: 0000000000000000 RCX: 0000000000000000
May 11 03:43:08 ip-10-10-29-201 kernel: RDX: 0000000000000002 RSI: 0000000000000002 RDI: ffff88081609c318
May 11 03:43:08 ip-10-10-29-201 kernel: RBP: ffff88041fc43ac8 R08: ffff8801049856d8 R09: ffff88041fc43a00
May 11 03:43:08 ip-10-10-29-201 kernel: R10: 0000000000000000 R11: 00000000e1bec8f9 R12: ffff88041fc43a38
May 11 03:43:08 ip-10-10-29-201 kernel: R13: ffffffff815f2d9d R14: ffff88041fc43ac8 R15: ffff88081609c300
May 11 03:43:08 ip-10-10-29-201 kernel: FS:  00007fb082b8b700(0000) GS:ffff88041fc40000(0000) knlGS:0000000000000000
May 11 03:43:08 ip-10-10-29-201 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 11 03:43:08 ip-10-10-29-201 kernel: CR2: 00007f2a743e6000 CR3: 00000008183c9000 CR4: 00000000000407e0
May 11 03:43:08 ip-10-10-29-201 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 11 03:43:08 ip-10-10-29-201 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 11 03:43:08 ip-10-10-29-201 kernel: Stack:
May 11 03:43:08 ip-10-10-29-201 kernel: ffff88041fc43af8 ffffffffa042429f ffff88003714be00 ffffe8fbefc41540
May 11 03:43:08 ip-10-10-29-201 kernel: ffff880419070e80 ffff88041fc43b30 ffff88041fc43be0 ffffffffa04239a4
May 11 03:43:08 ip-10-10-29-201 kernel: 00000001b9ec8070 ffff88003714be00 ffff88041fc43b28 0000000000000246
May 11 03:43:08 ip-10-10-29-201 kernel: Call Trace:
May 11 03:43:08 ip-10-10-29-201 kernel: <IRQ>
May 11 03:43:08 ip-10-10-29-201 kernel:
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa042429f>] ovs_flow_stats_update+0x4f/0xd0 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa04239a4>] ovs_dp_process_received_packet+0x84/0x120 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa042a01a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa042b4cd>] vxlan_rcv+0x6d/0x90 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa037b228>] vxlan_udp_encap_recv+0xb8/0x130 [vxlan]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81538bc2>] udp_queue_rcv_skb+0x162/0x3d0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815394bd>] __udp4_lib_rcv+0x19d/0x690
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815094d0>] ? ip_rcv_finish+0x350/0x350
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815399ca>] udp_rcv+0x1a/0x20
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81509584>] ip_local_deliver_finish+0xb4/0x1f0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81509858>] ip_local_deliver+0x48/0x80
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815091fd>] ip_rcv_finish+0x7d/0x350
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81509ac4>] ip_rcv+0x234/0x380
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814cfdb6>] __netif_receive_skb_core+0x676/0x870
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814cffc8>] __netif_receive_skb+0x18/0x60
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814d0b7e>] process_backlog+0xae/0x180
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814d041a>] net_rx_action+0x15a/0x250
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81067047>] __do_softirq+0xf7/0x290
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815f3a5c>] call_softirq+0x1c/0x30
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81014d25>] do_softirq+0x55/0x90
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff810673e5>] irq_exit+0x115/0x120
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815f4358>] do_IRQ+0x58/0xf0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815e94ad>] common_interrupt+0x6d/0x6d
May 11 03:43:08 ip-10-10-29-201 kernel: <EOI>
May 11 03:43:08 ip-10-10-29-201 kernel:
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa0424465>] ? ovs_flow_stats_get+0x145/0x180 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa0424453>] ? ovs_flow_stats_get+0x133/0x180 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa04217b7>] ovs_flow_cmd_fill_info+0x1c7/0x320 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa0421c5c>] ovs_flow_cmd_build_info.constprop.25+0x6c/0xa0 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffffa0422155>] ovs_flow_cmd_new_or_set+0x4c5/0x520 [openvswitch]
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff8108ec58>] ? __wake_up_common+0x58/0x90
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814ffcd8>] genl_family_rcv_msg+0x258/0x3d0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814ffe50>] ? genl_family_rcv_msg+0x3d0/0x3d0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814ffee1>] genl_rcv_msg+0x91/0xd0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814fdf99>] netlink_rcv_skb+0xa9/0xc0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814fe4c8>] genl_rcv+0x28/0x40
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814fd5bd>] netlink_unicast+0xed/0x1b0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814fd9a7>] netlink_sendmsg+0x327/0x760
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814fa874>] ? netlink_rcv_wake+0x44/0x60
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814fb92b>] ? netlink_recvmsg+0x1cb/0x3e0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814b79b0>] sock_sendmsg+0xb0/0xf0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814b807f>] ? sock_recvmsg+0xbf/0x100
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff8109b23e>] ? task_scan_min+0x3e/0x60
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff815e908b>] ? _raw_spin_unlock_bh+0x1b/0x40
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff814b7de9>] ___sys_sendmsg+0x3a9/0x3c0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff811f7fa9>] ? ep_scan_ready_list.isra.9+0x1b9/0x1f0
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff811f8123>] ? ep_poll+0x123/0x370
May 11 03:43:08 ip-10-10-29-201 kernel: [<ffffffff81079af3>] ? getrusage+0x43/0x70
May 11 03:43:09 ip-10-10-29-201 kernel: [<ffffffff814b8cd1>] __sys_sendmsg+0x51/0x90
May 11 03:43:09 ip-10-10-29-201 kernel: [<ffffffff814b8d22>] SyS_sendmsg+0x12/0x20
May 11 03:43:09 ip-10-10-29-201 kernel: [<ffffffff815f2119>] system_call_fastpath+0x16/0x1b
May 11 03:43:09 ip-10-10-29-201 kernel: Code: 02 00 f0 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f b7 f2 b8 00 80 00 00 eb 0c 0f 1f 44 00 00 f3 90 83 e8 01 7
4 0a <0f> b7 0f 66 39 ca 75 f1 5d c3 66 66 66 90 66 66 90 eb da 66 0f

通过在stackoverflow查询发现此问题属于内核bug,解决方法是升级内核。

下面是把centos 7.0默认3.10版本内核升级为4.0.2版本过程

1、导入yum源的认证key

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org

2、安装yum源

rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm

3、安装新内核

在yum的ELRepo源中,有mainline(4.0.2)这个内核版本

[[email protected] ~]# yum --enablerepo=elrepo-kernel install  kernel-ml-devel kernel-ml
Loaded plugins: fastestmirror
MooseFS                                                                                                                                            |  951 B  00:00:00
base                                                                                                                                               | 3.6 kB  00:00:00
elrepo                                                                                                                                             | 2.9 kB  00:00:00
elrepo-kernel                                                                                                                                      | 2.9 kB  00:00:00
extras                                                                                                                                             | 3.4 kB  00:00:00
updates                                                                                                                                            | 3.4 kB  00:00:00
(1/2): elrepo/primary_db                                                                                                                           | 233 kB  00:00:02
(2/2): elrepo-kernel/primary_db                                                                                                                    | 782 kB  00:00:04
MooseFS/primary                                                                                                                                    | 4.2 kB  00:00:00
Loading mirror speeds from cached hostfile
 * base: mirrors.yun-idc.com
 * elrepo: repos.lax-noc.com
 * elrepo-kernel: repos.lax-noc.com
 * extras: mirror.bit.edu.cn
 * updates: mirror.bit.edu.cn
MooseFS                                                                                                                                                             30/30
Resolving Dependencies
--> Running transaction check
---> Package kernel-ml.x86_64 0:4.0.2-1.el7.elrepo will be installed
---> Package kernel-ml-devel.x86_64 0:4.0.2-1.el7.elrepo will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==========================================================================================================================================================================
 Package                                   Arch                             Version                                         Repository                               Size
==========================================================================================================================================================================
Installing:
 kernel-ml                                 x86_64                           4.0.2-1.el7.elrepo                              elrepo-kernel                            36 M
 kernel-ml-devel                           x86_64                           4.0.2-1.el7.elrepo                              elrepo-kernel                           9.5 M

Transaction Summary
==========================================================================================================================================================================
Install  2 Packages

Total download size: 45 M
Installed size: 199 M
Is this ok [y/d/N]: y
Downloading packages:
(1/2): kernel-ml-4.0.2-1.el7.elrepo.x86_64.rpm                                                                                                     |  36 MB  00:00:11
(2/2): kernel-ml-devel-4.0.2-1.el7.elrepo.x86_64.rpm                                                                                               | 9.5 MB  00:00:31
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                                     1.5 MB/s |  45 MB  00:00:31
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Warning: RPMDB altered outside of yum.
  Installing : kernel-ml-devel-4.0.2-1.el7.elrepo.x86_64                                                                                                              1/2
  Installing : kernel-ml-4.0.2-1.el7.elrepo.x86_64                                                                                                                    2/2
  Verifying  : kernel-ml-4.0.2-1.el7.elrepo.x86_64                                                                                                                    1/2
  Verifying  : kernel-ml-devel-4.0.2-1.el7.elrepo.x86_64                                                                                                              2/2

Installed:
  kernel-ml.x86_64 0:4.0.2-1.el7.elrepo                                            kernel-ml-devel.x86_64 0:4.0.2-1.el7.elrepo

Complete!

4、查看当前内核版本

[[email protected] ~]# uname -r
3.10.0-123.el7.x86_64

重要:目前内核还是默认的版本,如果在这一步完成后你就直接reboot了,重启后使用的内核版本还是默认的3.10,不会使用新的4.0.2,想修改启动的顺序,需要进行下一步

查看默认启动顺序

[[email protected] ~]# awk -F\‘ ‘$1=="menuentry " {print $2}‘ /etc/grub2.cfg
CentOS Linux (4.0.2-1.el7.elrepo.x86_64) 7 (Core)
CentOS Linux, with Linux 3.10.0-123.el7.x86_64
CentOS Linux, with Linux 0-rescue-18b184aa09434ecf9739a70c6b63638a

默认启动的顺序是从0开始,但我们新内核是从头插入(目前位置在1,而4.0.2的是在0),所以需要选择0,如果想生效最新的内核,需要

[[email protected] ~]# grub2-set-default 0

5、重启

Reboot

6、重启后查看内核

[[email protected] conf]# uname -r
4.0.2-1.el7.elrepo.x86_64

经过升级后,20天没有出现此问题,所以判断此次文件为内核bug引起,通过升级内核解决。

时间: 2024-10-10 17:30:32

centos 7.0运行docker出现内存报错解决方法的相关文章

create-react-app创建项目后,运行npm run eject报错解决方法

运行npm run eject报错解决方法 主要问题是脚手架添加.gitgnore文件,但是却没有本地仓库,使用以下命令操作以下就可以了 git init git add . git commit -m 'saveing befor ejecting' 最后 npm run eject y 就解决了! 原文地址:https://www.cnblogs.com/ZhaoWeiNotes/p/11855731.html

Redis 5.0.3默认配置启动报错解决方法

一.redis默认配置启动报错误信息如下 # /usr/local/redis/bin/redis-server /usr/local/src/redis-5.0.3/redis.conf 5852:C 24 Jan 2019 23:00:07.676 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo 5852:C 24 Jan 2019 23:00:07.676 # Redis version=5.0.3, bits=64, commit=0000

c# winform 引用sqlite.dll 运行报错解决方法

错误信息 :  未能加载文件或程序集“System.Data.SQLite, Version=1.0.81.0, Culture=neutral, PublicKeyToken=db937bc2d44ff139”或它的某一个依赖项.试图加载格式不正确的程序. 原因是 此sqlite.dll 是32位的. 解决方法 ,将项目的 目标平台改为 x86 如图 c# winform 引用sqlite.dll 运行报错解决方法

运行Tomcat报错 解决方法

运行Tomcat报错 解决方法 运行Tomcat控制台报错: The APR based Apache Tomcat Native library which allows optimal performance in production environments was not found on the java.library.path: /usr/local/jdk1.6.0_26/jre/lib/i386/server:/usr/local/jdk1.6.0_26/jre/lib/i3

zabbix源码安装流程及报错解决方法

zabbix源码安装流程及报错解决方法 一. zabbix的介绍 1) zabbix是什么 zabbix是一款基于web页面的.开源的.企业级的,支持分布式部署的监控软件. 2)2.zabbix的作用 监控windows和Linux主机上的软硬件状态. 监控各网络设备,如路由器.交换机.防火墙.f5.打印机等. 支持邮件短信微信等平台发送告警通知. 通过web页面及图形展示监控数据. 3)zabbix的特性 由zabbix server和agent程序协同工作,还支持分布式监控,这时需要安装za

node-sass报错解决方法

node-sass报错解决方法 node-sass报错解决方法 在Vue.js中,每一个vue文件都是一个组件,在.vue文件中可以将模板,脚本,样式写在一起,便于组织整个组件.在使用template,script时,编写css样式时,都进行的特别顺利,唯独当我想用sass来预处理css时,在style下使用lang='sass'一直报错. 在.vue中是这样的. <template > <div class="haha"> <p> keith +

PS win7_无法打开提示MSVCP120.dll报错解决方法

PS win7_无打开提示MSVCP120.dll报错解决方法 1.PS软件安装后无法打开,并提示MSVCP120.dll"ps无法启动此程序此 因为计算机中丢失msvcp120.dll" 报错提示: 问题解决方法: 下载DirectX Repair工具-安装并运行即可,会自动检测是否丢失msvcp120.dll等一系列套件. 提示图片: 下载链接:http://www.pc6.com/softview/SoftView_57945.html 注:根据自己本身使用系统版本,下载Dire

putty生成密钥SSH远程登录注意步骤及报错解决方法

putty生成密钥SSH远程登录注意步骤及报错解决方法 报"Putty server refused our key"(解决)排查步骤: 1..ssh文件夹权限和authorized_keys文件权限 1-1创建.SSH目录及authorized_keys文件 mkdir –p /root/.ssh touch authorized_keys 1-2 修改目录及文件权限 Chmod 700 .ssh Chmod 600 authorized_keys 2. Selinux安全机制和Ip

远程连接Windows Server 2012 R2虚拟机报错解决方法

前两天在公司通过TeamViewer连回家里做实验时发现,连到某一台二代虚机时登陆会弹出一个错误提示,内容如下: 经过测试发现,只要是切换到普通模式(非增强模式)就可以正常登陆进系统,但是只要切换到特权模式就会出现这样的错误,开始以为是远程登陆的错误,查看后发现远程连接一切正常,权限方面也已经加进去了,那为什么增强模式下登陆就会出错呢? 在组策略里找了找也没发现答案,之后突然惊觉这个账户并不是本地的管理员,也不在域管理员组里,只是一个普通用户,是不是因为这点呢?运行管理员权限的命令提示符之后.输