Linux系统内对高CPU的监控及日志分析

使用linux系统时,占用cpu资源过高和,用脚本排查:

1,实时监控,一旦有cpu占用高的进程,程序启动;

2,再对进程分析,得出对应线程;

3,对对应线程所在的程序日志文档进行分析,比如Websphere中间件就有很详备的文件系统;

4,对于日志文件中error,worning等详细查看,但由于有时候日志文件过于庞大,并且容易忽略某些细节,如果用sed和awk,结合四则表达式,可以有效的定位其中的错误并不放过任何细节。

此脚本同,通过一个local脚本和一个remote脚本,能准确监控,并定位日志文件,并分析文件

highCpuAnalysis_l.sh:

###############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198
# This script is used to Analysis data for Performance, High CPU Issues on Linux‘
# Usage:    ./highCpuAnalysis.sh $IP $USER
# Author: HuangTao
# Email:[email protected]126.com
#
###############################################################################
##########################
#  Define Variables      #
##########################
export USER=$1;
export IP=$2;

##Usage:
if [ $# -eq 0 ] || [ $# -eq 1 ]
then
echo " Unable to find  USER and IP."
echo " Please rerun the script as follows:./highCpuAnalysis.sh USER IP"
echo "eg: ./highCpuAnalysis_l.sh root 192.168.86.198 "
exit 1
fi

##get the remote server‘s WAS application server name
export wasappname=$(ssh [email protected]$IP ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘)

##get the remote server‘s hostname
export remotehostname=$(ssh [email protected]$IP hostname)

##get the current directory
export dir=$(pwd)

###############################################################################
##Copy the script:highCpuAnalysis_r.sh to target host
echo "*********************************************************************"
echo "Step 1: "
echo "copy the highCpuAnalysis_r.sh to the remote host, and "
scp  highCpuAnalysis_r.sh  [email protected]$IP:/tmp/
ssh [email protected]$IP cd /tmp
ssh [email protected]$IP chmod 755 /tmp/highCpuAnalysis_r.sh
echo "is RUNING on $remotehostname($IP). "

###############################################################################
##run the script, make the script run on target remote host:
ssh [email protected]$IP  /tmp/highCpuAnalysis_r.sh
echo "*************************************************************************"
echo "Step 6:"
echo "Copy the report and javacore to the local fenxi host:"
###############################################################################
##Copy the report and javacore to the local host then delete them:
export dir=$(pwd)
scp [email protected]$IP:/tmp/HighCpuReport* .
scp [email protected]$IP:/tmp/javacore*.gz  .
tar -zxvf javacore*.gz

##Remove all related files in remate server
ssh [email protected]$IP rm -f /tmp/HighCpu*Report*
ssh [email protected]$IP rm -f /tmp/javacore*
ssh [email protected]$IP rm -f /tmp/highCpuAnalysis_r.sh
ssh [email protected]$IP rm -f /tmp/topdashH.*
echo "   "
echo "*********************************************************************"
echo "step 7:"
echo "Show All information:"
echo "Remote hostname: $remotehostname($IP)."
echo "Remote Appserver name:$wasappname."
echo "Report and javacore:"

rm -f javacore*.gz
ls -rlt HighCpu*Report* |tail -1
ls -rtl javacore* |tail -3

echo  "*******************************END**********************************"
 

highCpuAnalysis_r.sh

##aaa#############################################################################
#The source code is created in 10.19.90.165 and 192.168.86.198.
# This script is used to Analysis data for Performance, High CPU Issues on Linux‘
# Usage:    ./HighCpuAnalysis.sh
# Author: HuangTao
# Email:[email protected]126.com
#
###############################################################################
##########################
#  Define Variables      #
##########################
# How long the top dash H data should be taken in once(second).
TOP_DASH_H_VAL=30
# How many times dash H data should be taken.
TOP_DASH_H_VAL_T=3

# How long one javacores should be taken(second) .
JAVACORE_VAL=60
# How many times javacores should be taken.
JAVACORE_VAL_T=3 

##get High CPU pid
export pid=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘)

##get turn pid number to hexadecimal (from 10 to 16)
export pid16=$(echo "obase=10; $pid" | bc)

##check the pid if WAS process
export was=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $4}‘)

##get the WAS application name
export wasappname=$(ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r | sed -n ‘2p‘ |awk ‘{print $NF}‘)

##get hostname
export hostname=$(hostname)

###############################################################################
##########################
# Get High CPU PID       #
##########################
## put the report in /tmp/HighCpuReport.$pid.$hostname.out
echo "Script execude time:" $(date)  > /tmp/HighCpuReport.$pid.$hostname.out
echo "   "
if [ $was = wasuser ]  || [ $was = wasadmin ]
then
echo "*********************************************************************"
echo "Step 2:"
echo "The Highest CPU pid is :  $pid, the process is WAS porcess. "
else
echo "The Highest CPU pid :  $pid is NOT WAS process."  | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "
exit 1
fi
sleep 1;
echo "*********************************************************************"
###############################################################################
#########################
#                       #
# Start collection of:  #
#  * top dash H         #
#                       #
#########################
# Start the collection of top dash H data.
echo  "Step 3:"
echo  "Starting collection of top dash H data ..."
echo  "Need $[$TOP_DASH_H_VAL*TOP_DASH_H_VAL_T] seconds to complete this step:"
      top -bH -d $TOP_DASH_H_VAL -n $TOP_DASH_H_VAL_T -p $pid > /tmp/topdashH.$pid.$hostname.out
    #eg:   top -bH -d 30 -n 3 -p 7031
    #eg:  grep -v Swap toplog.out |grep -v Task |grep -v "Cpu(s)"|grep -v "Mem:" |grep -v top| sort -k 1 -r | head -10 | sed -n ‘2p‘ |awk ‘{print $3}‘
#echo "Analysis the snapshot of /tmp/topdashH.$pid.$hostname.out can find out the hight CPU thread" ;
echo  "Collected The top dash H data ."
sleep 2;
###############################################################################
###########################
#  Find out the Thread of  most CPU
#  and TIME consumner  Top 10 .
###########################

##delete the /tmp/topdashH.$pid.$hostname.out   when completed the data Collection

################################################################################
# Start collection of:  #
#  * javacores          #
#########################
# Javacores are output to the working directory of the JVM; in most cases this is the <profile_root>
echo "*********************************************************************"
echo  "Step 4:"
echo  "Starting collection of Javacores ..."
echo  "Need $[$JAVACORE_VAL*$JAVACORE_VAL_T] seconds to complete This step:"
##clear the javacore about this PID first:
rm -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid*
##then generate the javacore
        kill -3 $pid ;
        echo "Collected the first javacore for PID $pid ."
        sleep $JAVACORE_VAL

        kill -3 $pid ;
        echo "Collected the second javacore for PID $pid ."
        sleep $JAVACORE_VAL

        kill -3 $pid ;
        echo "Collected the third javacore for PID $pid ."
        sleep $JAVACORE_VAL    

##mv the javacore to the /tmp DIR and then zip:
rm -f /tmp/javacore*
mv -f /opt/IBM/WebSphere/AppServer/profiles/$wasappname/javacore*$pid*     /tmp/
cd /tmp
tar -zcvf javacore.$(date +%Y%m%d"."%H%M%S).$pid.gz javacore*$pid*
################################################################################

echo "*********************************************************************"
echo  "Step 5:"
echo  "Print out the Analysis infomantion:"
echo "   "                                                                      | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*********The most CPU consumner top 10 PROCESS :*********************"   | tee -a /tmp/HighCpuReport.$pid.$hostname.out
ps -eo pcpu,pmem,pid,user,args | sort -k 1 -r |head -10                          | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "

echo "****The most CPU consumner top 10 *Threads* from process $pid:********"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND "    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out|grep -v Cpu|sort -k9  -n -r  -k1 -u |head -10             | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "                                                                      | tee -a /tmp/HighCpuReport.$pid.$hostname.out

echo "****The most TIME consumner top 10 *Threads* from process $pid:*******"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND "    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
cat /tmp/topdashH.$pid.$hostname.out | grep -v Cpu|sort -k11  -n  -r -k1 -u  |head -10         | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "*~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~*"    | tee -a /tmp/HighCpuReport.$pid.$hostname.out
echo "   "

echo "Pleae check the javacore and HighCpuReport.$pid.$hostname.out under current directory."
 

至于为什么要用,2段脚本它的效果如何,希望本人能有机会当面和您沟通。

时间: 2024-10-15 11:46:57

Linux系统内对高CPU的监控及日志分析的相关文章

linux下搭建HTTP网站服务器和网站日志分析工具AWStats的使用

服务器IP地址:192.168.4.5 服务器主机名:srv5.tarena.com 1.在服务器端安装httpd软件包 [[email protected] /]# yum -y install httpd [[email protected] /]# service httpd start [[email protected] /]# chkconfig httpd on 2.在客户机端验证 在浏览器中输入192.168.4.5 如果显示欢迎页面表示服务器搭建成功 3.部署网页文档 首先将欢

监控项目日志分析

1.背景 根据数据实时监控查询文章点赞数量,确定订单任务执行进度是否完成! 2.日志分析 -100, -105 客户端身份校验失败, 1040002 系统繁忙, 20101 该微博不存在, 20112 Permission Denied! 3.处理方案 监控日志发现-100和-105表示查询文章数量的gsid和s失效,处理方式删除失效的数据调用新的数据下次监控: 1040002系统繁忙处理方式下次监控: 20101博文不存在.20112没有权限监控点赞数量处理方式反馈到后台退单! 原文地址:ht

18、Linux命令对服务器CPU进行监控

我刚开始做性能测试的时候,什么也不懂,就只知道压测.什么时候把系统压瘫痪什么时候结束.但是系统因为什么瘫痪却不是很清楚,后来开始研究服务器性能监控,运用到工作中,提高了不少生产力,下面就把我常用的命令或者工具分享给大家. 监控Linux 服务器CPU top 参数介绍 top - 10:38:29[当前系统时间], 194 days[系统已经运行了194天], 1 user[个用户当前登录], load average: 0.07, 0.03, 0.01[系统负载,即任务队列的平均长度] Tas

shell系统监控及日志分析

系统监控: CPU相关信息在/proc/stat 内存/proc/meminfo 流量监控proc/net/dev 磁盘io/proc/vmstat 脚本代码 1 #!/bin/bash 2 #监控linux主机系统信息 3 #导入工具模块 4 source utils 5 6 #获取CPU占用率 7 function cpuUsage() 8 { 9 #物理CPU个数 10 phyCPUNums=`cat /proc/cpuinfo |grep "physical id"|sort

19、Linux命令对服务器内存进行监控

监控Linux 服务器内存 国际惯例,我们要知道什么是服务器的内存,内存有哪些作用.这里就不做过多介绍,Linux性能监控需要我们对底层要有一定的理解.下面我将会列出我常用的监控内存的工具. vmstat vmstat显示关于进程,内存,页,块I/O,traps和CPU的信息.vmstat既可以展示平均值,也可以是实时数据.通过提供采样频率和采样时间就可以开启vmstat的采样模式. 参数解释: 进程 r:等待执行时间的进程数 b:在不可中断睡眠中的进程数 内存 swpd:已使用的虚拟内存量 f

linux查看某个进程CPU消耗较高的具体线程或程序的方法

目前我们的监控,可以发现消耗较高CPU的进程(阀值为3个CPU),通过监控我们可以找到消耗较高CPU的进程号: 通过进程号pid,我们在linux上可以通过top –H –p <pid>命令,显示该进程中每个线程的CPU资源消耗情况. 然后通过threadump命令,可以打印出某个应用JVM中某时刻所有线程的调用情况,通过线程号我们就可以对应找到线程调用的程序了. 另: 使用jrcmd命令也可以进行threadump和进程执行线程情况的查看,但该命令为jrockit 5.0 新带的命令,在wl

Linux CPU实时监控mpstat命令详解

Linux CPU实时监控mpstat命令详解 简介 mpstat是Multiprocessor Statistics的缩写,是实时系统监控工具.其报告与CPU的一些统计信息,这些信息存放在/proc/stat文件中.在多CPUs系统里,其不但能查看所有CPU的平均状况信息,而且能够查看特定CPU的信息.mpstat最大的特点是:可以查看多核心cpu中每个计算核心的统计数据:而类似工具vmstat只能查看系统整体cpu情况. 语法 mpstat [-P {|ALL}] [internal [co

linux 高cpu 分析

1.1查看CPU占用值 通常发生该类故障的时候,会反映在用户响应时间长,weblogic服务器运行速度异常缓慢,请求或者操作出现超时等.在接到故障通知后,登陆问题机器,执行查看进程命令:ps –ef | grep java 在这里我们要根据具体的告警内容来选出需要查看的进程:sxydfw 9391 9342 99 20:10 pts/1 01:00:22 /app/wls10/jdk1.6.0_45/bin/java -server -Xms1536m -Xmx1536m -XX:PermSiz

嵌入式 如何定位死循环或高CPU使用率(linux)

如何定位死循环或高CPU使用率(linux) 确定是CPU过高 使用top观察是否存在CPU使用率过高现象 找出线程 对CPU使用率过高的进程的所有线程进行排序 ps H -e -o pid,tid,pcpu,cmd --sort=pcpu |grep xxx 得到如下结果,其中线程2909使用了7.8%的CPU. 2907 2913 0.0 ./xxx 2907 2909 7.8 ./xxx 也可以通过查看/proc中的信息来确定高CPU线程. 打印了4列,线程ID,线程名,用户时间和内核时间