遇见recon 以来, 每次定位系统瓶颈, 总是能让我眼前一亮. 比如说, 定位非尾递归导致的内存暴涨, 定位引发CPU满载的进程.得心应手,每每额手称庆.
recon 是ferd 大神 释出的一个 用于生产环境诊断Erlang 问题的一个工具, 不仅仅是对Erlang stdlib 接口的封装, 还有memory fragmentation 相关的函数.
CPU 统计相关
在ferd 大神放出的 Erlang_In_Anger 中提到了
The reduction count has a direct link to function calls in Erlang, and a high count is usually the synonym of a high amount of CPU usage.
What’s interesting with this function is to try it while a system is already rather busy, with a relatively short interval. Repeat it many times, and you should hopefully see a pattern emerge where the same processes (or the same kind of processes) tend to always come up on top.
Using the code locations and current functions being run, you should be able to identify what kind of code hogs all your schedulers.
引用中提到的"this function" 是:
1> recon:proc_window(reductions, 3, 500).
也就是说, 将某个进程在一段时间内的reductions 变化大小作为这一段时间内该进程消耗CPU的程度.
Memory Leaks
memory leaks 主要是 refc binary, 这一点主要是和binary 的内存结构有关, 之前写的一篇blog 有提到这个.
解决的方式 ferd 也有一些建议:
Once you’ve established you’ve got a binary memory leak using recon:bin_leak(Max) , it should be simple enough to look at the top processes and see what they are and what kind of work they do.
Generally, refc binaries memory leaks can be solved in a few different ways, depending on the source:• call garbage collection manually at given intervals (icky, but somewhat efficient);
• stop using binaries (often not desirable);
• use binary:copy/1-210 if keeping only a small fragment (usually less than 64 bytes) of a larger binary;
• move work that involves larger binaries to temporary one-off processes that will die when they’re done (a lesser form
of manual GC!);• or add hibernation calls when appropriate (possibly the cleanest solution for inactive processes).
The first two options are frankly not agreeable and should not be attempted before all else failed. The last three options
are usually the best ones to be used.
第一种方案rabbitmq 其实是在使用的, 第二种基本上不太可能, 第三种应该在代码中多加注意, 第四种也就是尽可能使用Erlang VM 所倡导的short-lived 进程, 第五种也就是进程hibernate 方案同样在之前写的一篇blog中有提到 .
Memory Fragmentation
内存碎片和Erlang 虚拟机内存管理方式有很大的关系, 也就是内存泄露, 最明显的现象就是erlang:memory() 显示出来的内存使用量远远小于操作系统报告出来(如 top)的使用量.
总结
recon 是个实际操作性很强的工具, 没有实际的使用案例, 很难说得清楚它的妙用.
基友们有啥问题, 可以提出来, 大家一起交流.