问题描述:
测试运行一段时间后,测试客户端CPU100%,Loadrunner界面有错误报出。
问题分析过程:
抓取堆栈信息
分析堆栈发现线程有1000多,大部分为BLOCKED状态,ACTIVE状态基本看到的都是nio的,暂时没看到问题。
搜索测试代码类名,看看有没有测试代码引起的问题。
发现测试代码有好几个以下的堆栈
Thread 1134: (state = IN_NATIVE)
?- java.net.NetworkInterface.getAll() @bci=0 (Compiled frame; information may be imprecise)
?- java.net.NetworkInterface.getNetworkInterfaces() @bci=0, line=334 (Compiled frame)
?- com.alibaba.rocketmq.remoting.common.RemotingUtil.getLocalAddress() @bci=0, line=112 (Compiled frame)
?- com.alibaba.rocketmq.client.ClientConfig.<init>() @bci=19, line=32 (Compiled frame)
?- com.alibaba.rocketmq.client.producer.DefaultMQProducer.<init>
(java.lang.String, com.alibaba.rocketmq.remoting.RPCHook) @bci=1, line=95 (Compiled frame)
?- com.alibaba.rocketmq.client.producer.DefaultMQProducer.<init>(java.lang.String) @bci=3, line=86 (Compiled frame)
?-?********************MQProducer.<init>(java.lang.String, java.lang.String) @bci=71, line=62 (Compiled frame)
?-?********************.RocketMQ.sendMessage() @bci=76,line=119?(Compiled frame)?????//119为源代码行号
?-?********************.RocketMQ$1$1.safeRun() @bci=7, line=53 (Compiled frame)
?-?********************.SafeRunnable.run() @bci=1, line=13 (Compiled frame)
?- java.util.concurrent.ThreadPoolExecutor.runWorker
(java.util.concurrent.ThreadPoolExecutor$Worker) @bci=95, line=1145 (Compiled frame)
?- java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615 (Interpreted frame)
?- java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
发现 java.net.NetworkInterface.getAll() ,此方法比较耗费CPU,之前遇到过类似案例。接着分析为什么这几个线程会卡到这。
以下是问题代码,119行是(1)的位置
public void sendMessage() {
????????try {
????????????// 略…
????????????t = UserManager.getTransManager().CreateTransaction("Performace-client", trace);
????????????Message msg = new Message("Performace", msgContent.getBytes("UTF-8"));
????????????if (producer == null) {
????????????????producer = new MQProducer("Performace", "192.168.143.135:9876");??????(1)
????????????????producer.start();
????????????????rst = producer.product(msg);
????????????} else {
????????????????rst = producer.product(msg);
????????????}
?
?
????????????// 略…
?
?
????????} catch (Exception e) {
????????????// 略…
????????????if (producer != null) {
????????????????producer.shutdown();
????????????????producer = null;
????????????}
????????}
????}
问题分析
sendMessage方法会被随机的注册到一个timer线程池上,有可能会在同一时间点或者很近时间点同时执行该方法。
producer.product(msg);为给远端发送信息,如果因为网络原因或者其他未知原因导致Exception,会把producer赋值为null。
当再次执行sendMessage会重新初始化producer,如果恰好有多线程并发执行sendMessage,可能会导致重复初始化以及其他并发问题,导致恶性循环。
修改后
public void sendMessage() {
????????try {
????????????// 略…
????????????t = UserManager.getTransManager().CreateTransaction("Performace-client", trace);
????????????Message msg = new Message("Performace", msgContent.getBytes("UTF-8"));
????????????synchronized (this) {
????????????????if (producer == null) {
????????????????????producer = new MQProducer("Performace", "192.168.143.135:9876");
????????????????????producer.start();
????????????????}
????????????}
????????????rst = producer.product(msg);
?
?
????????????// 略…
?
?
????????} catch (Exception e) {
????????????// 略…
????????????if (producer != null) {
????????????????producer.shutdown();
????????????????producer = null;
????????????}
????????}
????}
加个同步等待,问题解决
?
?
作者:No.40
Blog:http://www.cnblogs.com/no40