

man (on RHEL 7)

# man mpstat
       Show the percentage of CPU utilization that occurred while executing at the user level (application).
       Show the percentage of CPU utilization that occurred while executing at the user level with nice priority.
       Show the percentage of CPU utilization that occurred while executing at the system level (kernel).Note that this does not include time spent servicing hardware and software interrupts.
       Show the percentage of time that the CPU or CPUs were idle during which the system had an outstanding disk I/O request.
       Show the percentage of time spent by the CPU or CPUs to service hardware interrupts.
       Show the percentage of time spent by the CPU or CPUs to service software interrupts.
       Show the percentage of time spent in involuntary wait by the virtual CPU or CPUs while the hypervisor was servicing another virtual  processor.
       Show the percentage of time spent by the CPU or CPUs to run a virtual processor.
       Show the percentage of time spent by the CPU or CPUs to run a niced guest.
       Show the percentage of time that the CPU or CPUs were idle and the system did not have an outstanding disk I/O request.

# man top
us, user : time running un-niced user processes
sy, system : time running kernel processes
ni, nice : time running niced user processes
id, idle : time spent in the kernel idle handler
wa, IO-wait : time waiting for I/O completion
hi : time spent servicing hardware interrupts
si : time spent servicing software interrupts
st : time stolen from this vm by the hypervisor


  • CPU Usage Time and Percentage
参考 mpstat 手册,%usr + %nice + %sys + %iwoait + %irq + %soft + %steal + %guest + %gnice + %idle = 100%

%steal一般是在虚拟机中才能看到数值,比如CPU overcommitment很严重的VPS,而%guest和%nice一般都很低,所以也可以根据/proc/stat或者top可得,user + nice + system + idle + iowait + irq + softirq + steal = 100

To calculate Linux CPU usage time subtract the idle CPU time from the total CPU time as follows:
Total CPU time since boot       = user + nice + system + idle + iowait + irq + softirq + steal
Total CPU Idle time since boot  = idle + iowait
Total CPU usage time since boot = (Total CPU time since boot) - (Total CPU Idle time since boot)
Total CPU percentage            = (Total CPU usage time since boot)/(Total CPU time since boot X 100)
  • Linux进程状态
运行状态(TASK_RUNNING):  是运行态和就绪态的合并,表示进程正在运行或准备运行,Linux 中使用TASK_RUNNING 宏表示此状态
可中断睡眠状态(浅度睡眠)(TASK_INTERRUPTIBLE):  进程正在睡眠(被阻塞),等待资源到来是唤醒,也可以通过其他进程信号或时钟中断唤醒,进入运行队列。Linux 使用TASK_INTERRUPTIBLE 宏表示此状态。
不可中断睡眠状态(深度睡眠状态)(TASK_UNINTERRUPTIBLE):  其和浅度睡眠基本类似,但有一点就是不可被其他进程信号或时钟中断唤醒。Linux 使用TASK_UNINTERRUPTIBLE 宏表示此状态。
暂停状态(TASK_STOPPED):  进程暂停执行接受某种处理。如正在接受调试的进程处于这种状态,Linux 使用TASK_STOPPED 宏表示此状态。
僵死状态(TASK_ZOMBIE):  进程已经结束但未释放PCB,Linux 使用TASK_ZOMBIE 宏表示此状态
  • %iowait 的正确认知
%iowait 表示在一个采样周期内有百分之几的时间属于以下情况:CPU空闲、并且有仍未完成的I/O请求。
对 %iowait 常见的误解有两个:  一是误以为 %iowait 表示CPU不能工作的时间,  二是误以为 %iowait 表示I/O有瓶颈。

首先 %iowait 升高并不能证明等待I/O的进程数量增多了,也不能证明等待I/O的总时间增加了。  例如,在CPU繁忙期间发生的I/O,无论IO是多还是少,%iowait都不会变;当CPU繁忙程度下降时,有一部分IO落入CPU空闲时间段内,导致%iowait升高。  再比如,IO的并发度低,%iowait就高;IO的并发度高,%iowait可能就比较低。
可见%iowait是一个非常模糊的指标,如果看到 %iowait 升高,还需检查I/O量有没有明显增加,avserv/avwait/avque等指标有没有明显增大,应用有没有感觉变慢,如果都没有,就没什么好担心的。
  • 查看CPU使用率,推荐如下Linux命令:
# top
# sar -u 1 5
# vmstat -n 1 5
# mpstat -P ALL 1 5
  • 查看Load的值,推荐如下Linux命令:
# top
# uptime
# sar -q 1 5
