本文基于Android4.4,
最近查了下watchdog打印错误log的问题。头都大。。。也查看了下android framework 下watchdog的实现代码,做个记录以备后边温习,以及新入行后辈们能够快速上手
以PowerManagerservice为例做简单流程分析
Watchdog功能:
1. 监视reboot广播
2. 监视加到check list 的service是否死锁
功能介绍:
功能1非常简单,就是注册一个broadcastreceiver,收到关机的Action就去走关机或reboot流程
主要说下功能2
PowerManagerService.java:
先说一下构造函数:
private Watchdog() {
super("watchdog");
// Initialize handler checkers for each common thread we want tocheck. Note
// that we are not currently checking the background thread, since itcan
// potentially hold longer running operations with no guarantees aboutthe timeliness
// of operations there.
// The shared foreground thread is the main checker. It is where we
// will also dispatch monitor checks and do other work.
mMonitorChecker = newHandlerChecker(FgThread.getHandler(),
"foreground thread",DEFAULT_TIMEOUT);
mHandlerCheckers.add(mMonitorChecker);
// Add checker for main thread. We only do a quick check since there
// can be UI running on the thread.
mHandlerCheckers.add(new HandlerChecker(newHandler(Looper.getMainLooper()),
"main thread",DEFAULT_TIMEOUT));
// Add checker for shared UI thread.
mHandlerCheckers.add(newHandlerChecker(UiThread.getHandler(),
"ui thread", DEFAULT_TIMEOUT));
// And also check IO thread.
mHandlerCheckers.add(newHandlerChecker(IoThread.getHandler(),
"i/o thread",DEFAULT_TIMEOUT));
}
红色部分很重要,添加UIThread,FgThread,IoThread,还有当前new
Watchdog时候的主线程,其实就是System_server主线程
接下来说下init:
public void init(Context context, LightsService ls,
ActivityManagerService am, BatteryService bs, IBatteryStats bss,
IAppOpsService appOps, DisplayManagerService dm) {
。。。。。。
mHandlerThread = new HandlerThread(TAG);
mHandlerThread.start();
mHandler = new PowerManagerHandler(mHandlerThread.getLooper());
Watchdog.getInstance().addMonitor(this);
//添加本对象到monitor列表里
Watchdog.getInstance().addThread(mHandler, mHandlerThread.getName());
。。。。。。
}
public void addMonitor(Monitor monitor) {
synchronized (this) {
if (isAlive()) {
throw newRuntimeException("Monitors can‘t be added once the Watchdog isrunning");
}
mMonitorChecker.addMonitor(monitor);//
}
}
Watchdog构造函数里
mMonitorChecker = newHandlerChecker(FgThread.getHandler(),
"foreground thread",DEFAULT_TIMEOUT);
public void addMonitor(Monitor monitor){
//mMonitors是一个数组list ArrayList<Monitor> mMonitors = newArrayList<Monitor>();
//添加powermanagerservice对象到此list
mMonitors.add(monitor);
}
接着说下Watchdog.getInstance().addThread(mHandler,mHandlerThread.getName());
public void addThread(Handler thread,String name) {
addThread(thread, name, DEFAULT_TIMEOUT);
}
public void addThread(Handler thread,String name, long timeoutMillis) {
synchronized (this) {
if (isAlive()) {
throw newRuntimeException("Threads can‘t be added once the Watchdog isrunning");
}
//把PowerManagerHandler
对象添加到mHandlerCheckers列表
mHandlerCheckers.add(newHandlerChecker(thread, name, timeoutMillis));
}
}
准备工作到此已经做完,接下来就是watchdog不停监视每个service是否死锁
代码主要在Watchdog.java
System_server会启动watchdog会跑到run函数:
WatchDog运行在一个单独的线程中,它的线程执行方法run()的代码如下:
public void run() {
booleanwaitedHalf= false;
while(true){
finalArrayListblockedCheckers;
finalStringsubject;
finalbooleanallowRestart;
synchronized(this){
longtimeout= CHECK_INTERVAL;
//给监控的线程发送消息
for (inti=0; i<mHandlerCheckers.size(); i++) {
HandlerChecker hc =mHandlerCheckers.get(i);
hc.scheduleCheckLocked();
}
//睡眠一段时间
longstart= SystemClock.uptimeMillis();
while(timeout> 0) {
try{
wait(timeout);
}catch(InterruptedException e) {
Log.wtf(TAG,e);
}
timeout=CHECK_INTERVAL - (SystemClock.uptimeMillis() -start);
}
//检查是否有线程或服务出问题了
finalintwaitState =evaluateCheckerCompletionLocked();
if(waitState== COMPLETED) {
waitedHalf=false;
continue;
}elseif (waitState == WAITING) {
continue;
}elseif (waitState == WAITED_HALF) {
if(!waitedHalf){
ArrayListpids= new ArrayList();
pids.add(Process.myPid());
ActivityManagerService.dumpStackTraces(true,pids,null, null,
NATIVE_STACKS_OF_INTEREST);
waitedHalf=true;
}
continue;
}
......
{
//杀死SystemServer
Process.killProcess(Process.myPid());
System.exit(10);
}
waitedHalf=false;
}
}
run()方法中有一个无限循环,每次循环中主要做三件事:
1. 调用scheduleCheckLocked()方法给所有受监控的线程发送消息。scheduleCheckLocked()方法的代码如下
publicvoidscheduleCheckLocked() {
if(mMonitors.size() == 0 &&mHandler.getLooper().isIdling()) {
mCompleted= true;
return;
}
if(!mCompleted) {
return;
}
mCompleted= false;
mCurrentMonitor= null;
mStartTime= SystemClock.uptimeMillis();
mHandler.postAtFrontOfQueue(this);//给监视的线程发送消息
}
HandlerChecker对象即要监控服务,也要监控某个线程。所以上面的代码先判断mMonitors的size是否为0。如果为0,说明这个HandlerChecker没有监控服务,这时如果被监控线程的消息队列处于空闲状态(调用isIdling()检查),则说明线程运行良好,把mCompleted设为true后就可以返回了。否则先把mCompleted设为false,然后记录消息开始发送的时间到变量mStartTime中,最后调用postAtFrontOfQueue()方法给被监控的线程发送一个消息。此时在Handler.java的
public voiddispatchMessage(Message msg) {
if (msg.callback != null) {
handleCallback(msg);
} else {
if (mCallback != null) {
if(mCallback.handleMessage(msg)) {
return;
}
}
handleMessage(msg);
}
}
private static void handleCallback(Message message) {
message.callback.run();
}
这个消息的处理方法是HandlerChecker类的方法run(),代码如下:
publicvoidrun() {
finalint size = mMonitors.size();
for(int i = 0 ; i < size ; i++) {
synchronized(Watchdog.this) {
mCurrentMonitor= mMonitors.get(i);
}
mCurrentMonitor.monitor();
}
synchronized(Watchdog.this) {
mCompleted= true;
mCurrentMonitor= null;
}
}
如果消息处理方法run()能够被执行,说明受监控的线程本身没有问题。但是还需要检查被监控服务的状态。检查是通过调用服务中实现的monitor()方法来完成的。通常monitor()方法的实现是获取服务中的锁,如果不能得到,线程就会被挂起,这样mCompleted的值就不能被置成true了。
mCompleted的值为true,表明HandlerChecker对象监控的线程或服务正常。否则就可能有问题。是否真有问题还要通过等待的时间是否超过规定时间来判断。
moninor()方法的实现通常如下:
publicvoidmonitor() {
synchronized(mLock) {
}
}
2. 给受监控的线程发送完消息后,调用wait()方法让WatchDog线程睡眠一段时间。
3. 逐个检查是否有线程或服务出问题了,一旦发现问题,马上杀死进程。
前面调用了方法evaluateCheckerCompletionLocked()来检查线程或服务是否有问题。evaluateCheckerCompletionLocked()方法的代码如下:
privateintevaluateCheckerCompletionLocked() {
intstate = COMPLETED;
for(int i=0; i
HandlerCheckerhc =mHandlerCheckers.get(i);
state= Math.max(state,hc.getCompletionStateLocked());
}
returnstate;
}
!waitedHalf, pids, null, null,NATIVE_STACKS_OF_INTEREST);
getCompletionStateLocked()函数根据等待时间来确认返回HandlerChecker对象的状态,代码如下:
publicintgetCompletionStateLocked() {
if(mCompleted) {
returnCOMPLETED;
}else {
longlatency = SystemClock.uptimeMillis() -mStartTime;
if(latency < mWaitMax/2) {
returnWAITING;
}else if (latency < mWaitMax) {
returnWAITED_HALF;
}
}
returnOVERDUE;
}
到此就已经分析完毕,如果对发送消息不明白可以看我博文里handler,looper的那篇文章。中间网上找了些博文,借鉴。如果有什么不足之处,请指正。。