理解LINUX LOAD AVERAGE的误区

一直不解,为什么io占用较高时,系统负载也会变高,偶遇此文,终解吾惑。

uptime和top等命令都可以看到load average指标,从左至右三个数字分别表示1分钟、5分钟、15分钟的load average:

$ uptime
 18:05:09 up 1534 days,  7:42,  1 user,  load average: 5.76, 5.54, 5.61

Load average的概念源自UNIX系统,虽然各家的公式不尽相同,但都是用于衡量正在使用CPU的进程数量和正在等待CPU的进程数量,一句话就是runnable processes的数量。所以load average可以作为CPU瓶颈的参考指标,如果大于CPU的数量,说明CPU可能不够用了。

但是,Linux上不是这样的!

Linux上的load average除了包括正在使用CPU的进程数量和正在等待CPU的进程数量之外,还包括uninterruptible sleep的进程数量。通常等待IO设备、等待网络的时候,进程会处于uninterruptible sleep状态。Linux设计者的逻辑是,uninterruptible sleep应该都是很短暂的,很快就会恢复运行,所以被等同于runnable。然而uninterruptible sleep即使再短暂也是sleep,何况现实世界中uninterruptible sleep未必很短暂,大量的、或长时间的uninterruptible sleep通常意味着IO设备遇到了瓶颈。众所周知,sleep状态的进程是不需要CPU的,即使所有的CPU都空闲,正在sleep的进程也是运行不了的,所以sleep进程的数量绝对不适合用作衡量CPU负载的指标,Linux把uninterruptible sleep进程算进load average的做法直接颠覆了load average的本来意义。所以在Linux系统上,load average这个指标基本失去了作用,因为你不知道它代表什么意思,当看到load average很高的时候,你不知道是runnable进程太多还是uninterruptible sleep进程太多,也就无法判断是CPU不够用还是IO设备有瓶颈。

参考资料:https://en.wikipedia.org/wiki/Load_(computing)“Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.“

源代码:

RHEL6
kernel/sched.c:
===============

static void calc_load_account_active(struct rq *this_rq)
{
        long nr_active, delta;

        nr_active = this_rq->nr_running;
        nr_active += (long) this_rq->nr_uninterruptible;

        if (nr_active != this_rq->calc_load_active) {
                delta = nr_active - this_rq->calc_load_active;
                this_rq->calc_load_active = nr_active;
                atomic_long_add(delta, &calc_load_tasks);
        }
}
RHEL7
kernel/sched/core.c:
====================

static long calc_load_fold_active(struct rq *this_rq)
{
        long nr_active, delta = 0;

        nr_active = this_rq->nr_running;
        nr_active += (long) this_rq->nr_uninterruptible;

        if (nr_active != this_rq->calc_load_active) {
                delta = nr_active - this_rq->calc_load_active;
                this_rq->calc_load_active = nr_active;
        }

        return delta;
}
RHEL7
kernel/sched/core.c:
====================

/*
 * Global load-average calculations
 *
 * We take a distributed and async approach to calculating the global load-avg
 * in order to minimize overhead.
 *
 * The global load average is an exponentially decaying average of nr_running +
 * nr_uninterruptible.
 *
 * Once every LOAD_FREQ:
 *
 *   nr_active = 0;
 *   for_each_possible_cpu(cpu)
 *      nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;
 *
 *   avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)
 *
 * Due to a number of reasons the above turns in the mess below:
 *
 *  - for_each_possible_cpu() is prohibitively expensive on machines with
 *    serious number of cpus, therefore we need to take a distributed approach
 *    to calculating nr_active.
 *
 *        \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0
 *                      = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }
 *
 *    So assuming nr_active := 0 when we start out -- true per definition, we
 *    can simply take per-cpu deltas and fold those into a global accumulate
 *    to obtain the same result. See calc_load_fold_active().
 *
 *    Furthermore, in order to avoid synchronizing all per-cpu delta folding
 *    across the machine, we assume 10 ticks is sufficient time for every
 *    cpu to have completed this task.
 *
 *    This places an upper-bound on the IRQ-off latency of the machine. Then
 *    again, being late doesn‘t loose the delta, just wrecks the sample.
 *
 *  - cpu_rq()->nr_uninterruptible isn‘t accurately tracked per-cpu because
 *    this would add another cross-cpu cacheline miss and atomic operation
 *    to the wakeup path. Instead we increment on whatever cpu the task ran
 *    when it went into uninterruptible state and decrement on whatever cpu
 *    did the wakeup. This means that only the sum of nr_uninterruptible over
 *    all cpus yields the correct result.
 *
 *  This covers the NO_HZ=n code, for extra head-aches, see the comment below.
 */

参考:

http://linuxperf.com/?p=176

时间: 2024-08-03 11:21:35

理解LINUX LOAD AVERAGE的误区的相关文章

Linux Load average负载详细解释

http://tianmaotalk.iteye.com/blog/1027970 Linux Load average负载详细解释 linux查看机器负载

linux load average

性能分析_linux服务器CPU_Load Average 理解Linux系统中的load average(图文版) 理解Load Average做好压力测试 top命令的Load average 含义及性能参考基值 几点说明: 1.低利用率的情况下是否会有高Load Average的情况产生呢?理解占有时间和使用时间就可以知道,当分配时间片以后,是否使用完全取决于使用者,因此完全可能出现低利用率高Load Average的情况.由此来看,仅仅从CPU的使用率来判断CPU是否处于一种超负荷的工作

Load Average (系统负载)说明

linux load average 系统平均负载被定义为在特定时间间隔内运行队列中的平均进程数.如果一个进程满足以下条件则其就会位于运行队列中: - 它没有在等待I/O操作的结果 - 它没有主动进入等待状态(也就是没有调用'wait') - 没有被停止(例如:等待终止) load average: 0.09, 0.05, 0.01 三个数分别代表不同时间段的系统平均负载(一分钟.五 分钟.以及十五分钟) 从性能的角度上理解,一台主机拥有多核心的处理器与另台拥有同样数目的处理性能基本上可以认为是

理解Linux系统中的load average

理解Linux系统中的load average(图文版) 博客分类: Linux linux load nagios 一.什么是load average? linux系统中的Load对当前CPU工作量的度量 (WikiPedia: the system load is a measure of the amount of work that a computer system is doing).也有简单的说是进程队列的长度. Load Average 就是一段时间 (1 分钟.5分钟.15分钟

[转]理解Linux系统中的load average

转自:http://heipark.iteye.com/blog/1340384 谢谢,写的非常好的文章. 一.什么是load average linux系统中的Load对当前CPU工作量的度量 (WikiPedia: the system load is a measure of the amount of work that a computer system is doing).也有简单的说是进程队列的长度. Load Average 就是一段时间 (1 分钟.5分钟.15分钟) 内平均

理解Linux系统中的load average(图文版)转

一.什么是load average? linux系统中的Load对当前CPU工作量的度量 (WikiPedia: the system load is a measure of the amount of work that a computer system is doing).也有简单的说是进程队列的长度. Load Average 就是一段时间 (1 分钟.5分钟.15分钟) 内平均 Load . 我们可以通过系统命令"w"查看当前load average情况 [[email p

理解Linux中的load Averges

一.什么是load average? linux系统中的Load对当前CPU工作量的度量 (WikiPedia: the system load is a measure of the amount of work that a computer system is doing).也有简单的说是进程队列的长度. Load Average 就是一段时间 (1 分钟.5分钟.15分钟) 内平均 Load . 我们可以通过系统命令"w"查看当前load average情况 [[email p

理解Load Average做好压力测试(转)

转载自:http://www.blogjava.net/cenwenchu/archive/2008/06/30/211712.html SIP的第四期结束了,因为控制策略的丰富,早先的的压力测试结果已经无法反映在高并发和高压力下SIP的运行状况,因此需要重新作压力测试.跟在测试人员后面做了快一周的压力测试,压力测试的报告也正式出炉,本来也就算是告一段落,但第二天测试人员说要修改报告,由于这次作压力测试的同学是第一次作,有一个指标没有注意,因此需要修改几个测试结果.那个没有注意的指标就是load

理解Load Average做好压力测试

http://www.blogjava.net/cenwenchu/archive/2008/06/30/211712.html CPU时间片 为了提高程序执行效率,大家在很多应用中都采用了多线程模式,这样可以将原来的序列化执行变为并行执行,任务的分解以及并行执行能够极大地提高程序的运行效率.但这都是代码级别的表现,而硬件是如何支持的呢?那就要靠CPU的时间片模式来说明这一切.程序的任何指令的执行往往都会要竞争CPU这个最宝贵的资源,不论你的程序分成了多少个线程去执行不同的任务,他们都必须排队等