分布式通讯优化篇 – IRQ affinity

在一次C500K性能压测过程中,发现一个问题:8 processor的CPU,负载基本集中在CPU0,并且负载达到70以上,并通过mpstat发现CPU0每秒总中断(%irq+%soft)次数比较高。

基于对此问题的研究,解决和思考,便有了这篇文章,希望大家能够喜欢,也欢迎大家留言讨论。

在正文开始之前,我们先来看两个跟性能相关的基本概念:中断与上线文切换(在实际场景中,发现90%以上的同学无法解释清楚,希望这篇文章能给你带去比较深刻的理解)。

中断

        Hardware interrupts are used by devices to communicate that they require attention from the operating system. Internally, hardware interrupts are implemented usingelectronic alerting signals
that are sent to the processor from anexternal device, which is either a part of the computer itself, such as a disk controller, or an external peripheral. For example, pressing a key on the keyboard or moving the mouse triggers hardware interrupts
that cause the processor to read the keystroke or mouse position. Unlike the software type (described below), hardware interrupts areasynchronous and can occur in the middle of instruction execution, requiring additional care in programming.
The act of initiating a hardware interrupt is referred to as aninterrupt request (IRQ).

A software interrupt
is caused either by anexceptional condition in the processor itself, or aspecial instruction in the instruction set which causes an interrupt when it is executed. The former is often called a trap or exception
and is used for errors or events occurring during program execution that are exceptional enough that they cannot be handled within the program itself. For example, if the processor‘s arithmetic logic unit is commanded to divide a number by zero, this impossible
demand will cause a divide-by-zero exception, perhaps causing the computer to abandon the calculation or display an error message. Software interrupt instructionsfunction similarly to subroutine calls and are used for a variety of purposes,
such as to request services from low-level system software such as device drivers. For example, computers often use software interrupt instructions to communicate with the disk controller to request data be read or written to the disk.

硬中断,硬件中断CPU,通常是异步处理的;软中断,指令中断内核执行,分两种情况,一种是异常,另外一种是subroutine calls,注意这里要和system call分开。

上线文切换

In computing, a context switch is the process ofstoring and restoring the state (context) of a process or thread so that execution can be resumed from the same point at a later time. This enables
multiple processes to share a single CPU and is an essential feature of a multitasking operating system. What constitutes the context is determined by the processor and the operating system.Context switches are usually computationally intensive,
and much of the design of operating systems is to optimize the use of context switches. Switching from one process to another requires a certain amount of time for doing the administration – saving and loading registers and memory maps, updating various
tables and lists etc
. A context switch can mean aregister context switch, atask context switch, astack frame switch, athread context switch, ora process context switch.

上下文切换,发生在内核态,system call仅仅是kernel mode switch,上下文切换有多种表现形式,如进程之间,线程之间,栈帧之间等。

问题来了,中断和上下文切换之间究竟存在什么样的内在数理关系?翻阅了很多外文资料,无果而返。最后的Action,准备去check linux kernal代码中关于cs ,%soft和%irq的统计逻辑,如果你恰巧了解,还希望不吝赐教哈!

Ok,下面我们来看一下整个亲核优化过程中所需要掌握的基本技巧:RPS/RFS,irqbalance和irq affinity!

RPS/RFS - Receive Package Steering/Receive Flow Steering

Google同学开发的patch,从2.6.35开始加入到kernel中。简单来说,其原理是利用hash算法来hash TCP或者 UDP的 package header,并根据应用所在的CPU去选择软中断所需要的CPU。文档中有一句话,最能概括它的使用场景,如下。大致意思是说网卡单队列模式以及队列数少于CPU核数的场景下,如果能保证共享内存,用它无疑是最佳神器。

For a single queue device, a typical RPS configuration would be to set the rps_cpus to the CPUs in the same memory domain of the interrupting CPU. If NUMA locality is not an issue, this could also be all CPUs in the system. At high interrupt rate,
it might be wise to exclude the interrupting CPU from the map since that already performs much work. For a multi-queue system, if RSS is configured so that a hardware receive queue is mapped to each CPU, then RPS is probably redundant and unnecessary. If there
are fewer hardware queues than CPUs, then RPS might be beneficial if the rps_cpus for each queue are the ones that share the same memory domain as the interrupting CPU for that queue.

那问题又来了,如何辨别多队列网卡?如何保障共享内存?提供一种思路,对于第一个问题,可以用命令

        lspci -vvv | grep 'Ethernet controller'

如果有MSI-X && Enable+ && TabSize > 1,则该网卡是多队列网卡。对于第二个问题,可以考虑在lscpu的帮助下,将中断绑定到具体的物理CPU上。

Irqbalance

手册上是这么说的,distribute hardware interrupts across processors on a multiprocessor system。在SMP体系结构上问题还是蛮多的,可以参看Ubuntu的Bug追踪系统。当然,国内褚霸同学对其源码进行了详细分析,感兴趣的可以也参看这里

SMP IRQ Affinity

最后,来看一下kernel 2.4加入的SMP IRQ Affinity:

An interrupt request (IRQ) is a request for service, sent at the hardware level. Interrupts can be sent by either a dedicated hardware line, or across a hardware bus as an information packet (a Message Signaled Interrupt, or MSI). When interrupts are
enabled, receipt of an IRQ prompts a switch to interrupt context. Kernel interrupt dispatch code retrieves the IRQ number and its associated list of registered Interrupt Service Routines (ISRs), and calls each ISR in turn. The ISR acknowledges the interrupt
and ignores redundant interrupts from the same IRQ, then queues a deferred handler to finish processing the interrupt and stop the ISR from ignoring future interrupts.

/proc/interrupts列出了IRQ number, the number of that interrupt handled by each CPU core, the interrupt type, and a comma-delimited list of drivers that are registered to receive that interrupt. (Refer to the proc(5) man page for further details: man 5 proc).

/proc/irq/IRQ_NUMBER/smp_affinity,smp_affinity是用来描述中断亲和特性的,this property can be used to improve application performance by assigning both interrupt affinity and the application‘s thread affinity to one or more specific CPU cores. This allows cache line
sharing between the specified interrupt and application threads.

如何验证你的中断亲核性设置是否OK呢?请参看下面的流程:

a. 查看网卡中断号:

       cat /proc/interrupts

b. 查看该中断号的cpu affinity:

       sudo cat /proc/irq/42/smp_affinity

c. 修改绑定:

       sudo echo ff > /proc/irq/42/smp_affinity

d. 访问特定网站:

       ping -f www.creative.com

e. 查看中断绑定结果:

       cat /proc/interrupts | grep  'CPU\|42:'

小结:

在我的多队列网卡中,手动绑定了SMP IRQ Affinity的值,并且排除了其它两种优化方式的干扰,解决掉了开篇提到的性能问题。 但我文章里面提到的那个数理关系,后续还是需要跟进一下,如果有更多的发现,会及时分享给大家,希望大家能够喜欢!

参考文档:

1. http://en.wikipedia.org/wiki/Interrupt

2. http://en.wikipedia.org/wiki/Context_switch

3. http://wenku.baidu.com/view/315d2c8571fe910ef12df838.html

4. https://www.kernel.org/doc/Documentation/networking/scaling.txt

5. http://kernelnewbies.org/Linux_2_6_35

6. https://cs.uwaterloo.ca/~brecht/servers/apic/SMP-affinity.txt

7. http://www.linfo.org/context_switch.html

8. http://lwn.net/Articles/328339/

9. http://lwn.net/Articles/398385/

10. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/network-rps.html

11. https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html smp_affinity

12, http://www.linfo.org/context_switch.html

时间: 2024-10-19 17:56:08

分布式通讯优化篇 – IRQ affinity的相关文章

Linux 多核下绑定硬件中断到不同 CPU(IRQ Affinity) 转

硬件中断发生频繁,是件很消耗 CPU 资源的事情,在多核 CPU 条件下如果有办法把大量硬件中断分配给不同的 CPU (core) 处理显然能很好的平衡性能.现在的服务器上动不动就是多 CPU 多核.多网卡.多硬盘,如果能让网卡中断独占1个 CPU (core).磁盘 IO 中断独占1个 CPU 的话将会大大减轻单一 CPU 的负担.提高整体处理效率.VPSee 前天收到一位网友的邮件提到了 SMP IRQ Affinity,引发了今天的话题:D,以下操作在 SUN FIre X2100 M2

SMP IRQ Affinity

转:非常有用的方法,调式神器 Background: Whenever a piece of hardware, such as disk controller or ethernet card, needs attention from the CPU, it throws an interrupt.  The interrupt tells the CPU that something has happened and that the CPU should drop what it's d

基于Netty的聊天系统(一)通讯原理篇

今天周六,正好顺便把聊天系统的通讯原理写一下,本来是用XMPP+Openfire做了一个聊天,但是在做群聊那块需要去写插件来主动向表里变去写数据,因为openfire外国人写的,最初设计的群聊是会议室那种形式,和我们现在这种QQ群聊还是有差别的,改造起来比较麻烦,需要去通都源码等等,openfire是基于mina来写的,mina和netty又出自同一作者之手,那么我们就基于netty来写一个吧,首先我们来谈一谈通讯的原理 1:通讯原理 首先比方说A和B两个人要进行聊天(这里我们先讨论一对一这种聊

DFS分布式文件系统--基础篇

DFS分布式文件系统--基础篇 DFS是将相同的文件同时存储到网络上多台服务器上后,就可以有以下功能和优点: 提高文件的访问效率:DFS服务器会向客户端提供一个服务器列表,列表中的这些服务器内部有客户端所需要的文件.DFS会将最接近客户端的服务器,放在列表最前面,以便让客户端优先从这台服务器来访问文件 . 提高文件的可用性:当提供资源的服务器列表中的某一台服务器出现故障,客户端仍然可以从列表中的下一台服务器获取所需要的文件,即DFS提供排错功能. 服务器负载平衡功能:由于存放相同文件,有可能有多

画图工具之优化篇

import java.awt.BasicStroke; import java.awt.Color; import java.awt.Graphics; import java.awt.Graphics2D; import java.awt.RenderingHints; import java.awt.event.ActionEvent; import java.awt.event.ActionListener; /** * 1.新建一个LoginListener事件处理类, * 该类实现M

DFS分布式文件系统--部署篇

DFS分布式文件系统--部署篇 续DFS分布式文件系统--基础篇 三.DFS部署实例 在VMwareWorkstation 12.0虚拟环境下,建立五台Windows Server 2008 R2虚拟机. NameSrv01:192.168.0.180/24,DC,DNS: NameSrv02:192.168.0.181/24,辅助DC,DNS:域名FromHeart.Com. 这两台DC为命名空间服务器. ShareSrv01:192.168.0.182/24:ShareSrv02:192.1

nginx优化篇之Linux 内核参数的优化

原博客地址(欢迎访问):http://www.loveyqq.tk/blog/2014/05/27/nginxyou-hua-pian-zhi-linux-nei-he-can-shu-de-you-hua/ 由于默认的Linux内核参数考虑的是最通用的场景,这明显不符合用于支持高并发访问的Web服务器的定义,所以需要修改Linux内核参数,使得Nginx可以拥有更高的性能. 在优化内核时,可以做的事件很多,不过,我们通常会根据业务特点来进行调整,当Nginx作为静态Web内容服务器.反向代理服

Juniper老司机经验谈(SRX防火墙优化篇)视频课程上线了

大家在QQ群.论坛里经常提的问题,许多人对SRX双机不是很理解,实际工作中碰见太多问题,惹出了少少麻烦. 针对这个我录制了一个Juniper老司机经验谈(SRX防火墙优化篇)视频课程,上线了.只有9块钱,象征性收费,几天卖出了50多份. 主要内容如下: 1 juniper模拟器使用(windows篇) [免费观看] 40分钟 本章节介绍windows环境下,juniper模拟器的部署.为学习实验做好准备. 2 juniper模拟器使用(MAC篇) 17分钟 本章节介绍在MAC BOOK下,jun

全文检索之lucene的优化篇--创建索引库

在上一篇HelloWorld的基础上,建立一个directory的包,添加一个DirectoryTest的测试类,用来根据指定的索引目录创建目录存放指引. DirectoryTest类中的代码如下,基本上就是在HelloWorld的基础上改改就可以了. 里面一共三个方法,testDirectory(),测试创建索引库;testDirectoryFSAndRAM(),结合方法1的两种创建方式,优化;testDirectoryOptimize(),在方法2个基础上,研究索引的优化创建,减少创建的索引