kernel panic 分析(camera导致的mem越界)

  这个panic是由CTS测试的时候发现的,panic的log如下:

[ 2212.531425] c3 3279 (logcat) Unable to handle kernel paging request at virtual address 2b2c2c2b2b292a2a
[ 2212.541032] c3 3279 (logcat) pgd = ffffffc00d5f5000
[ 2212.545910] [2b2c2c2b2b292a2a] *pgd=0000000000000000
[ 2212.550992] c3 3279 (logcat) Internal error: Oops: 96000044 [#1] PREEMPT SMP
[ 2212.557983] Modules linked in: sd8777 mlan8777 audiostub cidatattydev gs_modem ccinetdev cci_datastub citty iml_module seh cploaddev msocketk tzdd galcore(O) [last unloaded: mbt8777]
[ 2212.574228] c3 3279 (logcat) CPU: 3 PID: 3279 Comm: logcat Tainted: G           O 3.10.33 #1
[ 2212.582601] c3 3279 (logcat) task: ffffffc0132d09c0 ti: ffffffc01ed20000 task.ti: ffffffc01ed20000
[ 2212.591495] c3 3279 (logcat) PC is at memcpy+0xc0/0x180
[ 2212.596680] c3 3279 (logcat) LR is at tty_insert_flip_string_fixed_flag+0x78/0xcc
[ 2212.604102] c3 3279 (logcat) pc : [<ffffffc000300180>] lr : [<ffffffc000367058>] pstate: 80000145
[ 2212.612903] c3 3279 (logcat) sp : ffffffc01ed23c90
[ 2212.617650] R29: ffffffc01ed23c90 R28: ffffffc01f8b8000
[ 2212.622930] R27: 0000000000000067 R26: 0000000000000000
[ 2212.628211] R25: 0000000000000700 R24: ffffffc0287cfe00
[ 2212.633493] R23: ffffffc02e9ee000 R22: 0000000000000067
[ 2212.638774] R21: 0000000000000067 R20: 0000000000000067
[ 2212.644055] R19: ffffffc00f0cbc00 R18: 0000000000000000
[ 2212.649346] R17: 0000000000000000 R16: ffffffc000192da4
[ 2212.654626] R15: 0000000000000000 R14: 00000000f6ff9eaf
[ 2212.659908] R13: 00000000fff32670 R12: 00000000fff32678
[ 2212.665189] R11: 00000000aac2ff00 R10: 0000000000000000
[ 2212.670470] R9 : 00000001ffffffff R8 : 362e30333a31313a
[ 2212.675752] R7 : 35302030322d3131 R6 : 2b2c2c2b2b292a2a
[ 2212.681032] R5 : ffffffc000361ac8 R4 : 0000000000000000
[ 2212.686322] R3 : 2b2c2c2b2b292a2a R2 : ffffffffffffffe7
[ 2212.691604] R1 : ffffffc02e9ee010 R0 : 2b2c2c2b2b292a2a

  关键的信息就在上面标黄的几行当中,可以看到,kernel是在试图访问一个很诡异的地址(2b2c2c2b2b292a2a)的时候发生错误的,而R0正好也是这个值,我们知道在arm体系当中,R0一般用来传递函数的第一个参数,下面我们通过分析PC和LR来获取更多的信息。

  通过addr2line工具得到panic时候的code现场:

aarch64-linux-gnu-addr2line -e vmlinux ffffffc000300180
??:?
aarch64-linux-gnu-addr2line -e vmlinux ffffffc000367058
/home/buildfarm/aabs/src.pxa1928-kk4.4.beta2/kernel/drivers/tty/tty_buffer.c:269

  pc没有解析出来,但是LR得到了

256int tty_insert_flip_string_fixed_flag(struct tty_port *port,
257        const unsigned char *chars, char flag, size_t size)
258{
259    int copied = 0;
260    do {
261        int goal = min_t(size_t, size - copied, TTY_BUFFER_PAGE);
262        int space = tty_buffer_request_room(port, goal);
263        struct tty_buffer *tb = port->buf.tail;
264        /* If there is no space then tb may be NULL */
265        if (unlikely(space == 0)) {
266            break;
267        }
268        memcpy(tb->char_buf_ptr + tb->used, chars, space);
269        memset(tb->flag_buf_ptr + tb->used, flag, space);
270        tb->used += space;
271        copied += space;
272        chars += space;
273        /* There is a small chance that we need to split the data over
274           several buffers. If this is the case we must loop */
275    } while (unlikely(size > copied));
276    return copied;
277}

  可以很直观的看到,kernel是在执行memcpy的时候出错了,PC解析不出来是因为memcpy是库函数,那么R0的值就应该是tb->char_buf_ptr + tb->used,我们把这个函数反汇编来继续寻找线索。

ffffffc000366fe0 <tty_insert_flip_string_fixed_flag>:
ffffffc000366fe0:       a9ba7bfd        stp     x29, x30, [sp,#-96]!
ffffffc000366fe4:       910003fd        mov     x29, sp
ffffffc000366fe8:       a90363f7        stp     x23, x24, [sp,#48]
ffffffc000366fec:       a9046bf9        stp     x25, x26, [sp,#64]
ffffffc000366ff0:       a9025bf5        stp     x21, x22, [sp,#32]
ffffffc000366ff4:       f9002bfb        str     x27, [sp,#80]
ffffffc000366ff8:       a90153f3        stp     x19, x20, [sp,#16]
ffffffc000366ffc:       aa0003f8        mov     x24, x0
ffffffc000367000:       aa0103f7        mov     x23, x1
ffffffc000367004:       53001c5a        uxtb    w26, w2
ffffffc000367008:       aa0303fb        mov     x27, x3
ffffffc00036700c:       52800015        mov     w21, #0x0                       // #0
ffffffc000367010:       d2800004        mov     x4, #0x0                        // #0
ffffffc000367014:       d280e019        mov     x25, #0x700                     // #1792
ffffffc000367018:       cb040361        sub     x1, x27, x4
ffffffc00036701c:       f11c003f        cmp     x1, #0x700
ffffffc000367020:       9a999021        csel    x1, x1, x25, ls
ffffffc000367024:       aa1803e0        mov     x0, x24
ffffffc000367028:       97ffff44        bl      ffffffc000366d38 <tty_buffer_request_room>
ffffffc00036702c:       93407c16        sxtw    x22, w0
ffffffc000367030:       2a0003f4        mov     w20, w0
ffffffc000367034:       aa1703e1        mov     x1, x23
ffffffc000367038:       aa1603e2        mov     x2, x22
ffffffc00036703c:       f9401b13        ldr     x19, [x24,#48]
ffffffc000367040:       34000260        cbz     w0, ffffffc00036708c <tty_insert_flip_string_fixed_flag+0xac>
ffffffc000367044:       f9400663        ldr     x3, [x19,#8]
ffffffc000367048:       b9801a60        ldrsw   x0, [x19,#24] //
ffffffc00036704c:       0b1402b5        add     w21, w21, w20
ffffffc000367050:       8b000060        add     x0, x3, x0
ffffffc000367054:       97fe641b        bl      ffffffc0003000c0 <memcpy> //这是kernel panic的地方
ffffffc000367058:       f9400a62        ldr     x2, [x19,#16]  //这个就是LR,返回地址
ffffffc00036705c:       b9801a60        ldrsw   x0, [x19,#24]
ffffffc000367060:       2a1a03e1        mov     w1, w26
ffffffc000367064:       8b000040        add     x0, x2, x0
ffffffc000367068:       aa1603e2        mov     x2, x22
ffffffc00036706c:       97fe64d5        bl      ffffffc0003003c0 <memset>
ffffffc000367070:       b9401a60        ldr     w0, [x19,#24]
ffffffc000367074:       93407ea4        sxtw    x4, w21
ffffffc000367078:       0b140014        add     w20, w0, w20
ffffffc00036707c:       b9001a74        str     w20, [x19,#24]
ffffffc000367080:       eb04037f        cmp     x27, x4
ffffffc000367084:       8b1602f7        add     x23, x23, x22
ffffffc000367088:       54fffc88        b.hi    ffffffc000367018 <tty_insert_flip_string_fixed_flag+0x38>
ffffffc00036708c:       2a1503e0        mov     w0, w21
ffffffc000367090:       a94153f3        ldp     x19, x20, [sp,#16]
ffffffc000367094:       a9425bf5        ldp     x21, x22, [sp,#32]
ffffffc000367098:       a94363f7        ldp     x23, x24, [sp,#48]
ffffffc00036709c:       a9446bf9        ldp     x25, x26, [sp,#64]

  从汇编代码中可以看到,X0(也就是R0)是通过X0加X3得到的,而X0和X3都是通过取地址X19加一些offset得到,结合code容易得出,X19就是tty_buffer的结构体指针,它的定义如下:

31struct tty_buffer {
32    struct tty_buffer *next;
33    char *char_buf_ptr;
34    unsigned char *flag_buf_ptr;
35    int used;
36    int size;
37    int commit;
38    int read;
39    /* Data points here */
40    unsigned long data[0];
41};

  由前面的panic log可以知道X19的值等于ffffffc00f0cbc00,继续通过crash工具来查看这个地址的内容

crash> struct tty_buffer 0xffffffc00f0cbc00
struct tty_buffer {
  next = 0x0,
  char_buf_ptr = 0x2b2c2c2b2b292a2a <Address 0x2b2c2c2b2b292a2a out of bounds>,
  flag_buf_ptr = 0x2a2a2a2b2c2a2b2b <Address 0x2a2a2a2b2c2a2b2b out of bounds>,
  used = 0,
  size = 690695211,
  commit = 0,
  read = 0,
  data = 0xffffffc00f0cbc28
}

  binggo,发现诡异地址来源,它的确来自于tty_buffer,为什么原本应该是一个正常的地址值现在却变成了这么一个诡异的值呢,很大可能是内存被覆盖了,而且这个值貌似还有一定的pattern,于是脑洞开一开,查看一下这个buffer附近的内存内容。

  

rd 0xffffffc00f0cac00 1000
...
ffffffc00f0cb420:  000000a8000000a8 35302030322d3131   ........11-20 05
ffffffc00f0cb430:  362e30333a31313a 3634353520203333   :11:30.633  5546
ffffffc00f0cb440:  4420323836352020 6d61436c76724d20     5682 D MrvlCam
ffffffc00f0cb450:  6e69676e45617265 69666e6f43203a65   eraEngine: Confi
ffffffc00f0cb460:  696c657069505f67 6e6f4365203a656e   g_Pipeline: eCon
ffffffc00f0cb470:  2c5d305b74786574 6172656d61436520   text[0], eCamera
ffffffc00f0cb480:  5b646e616d6d6f43 654e62202c5d3131   Command[11], bNe
ffffffc00f0cb490:  305b326e69426465 6c69745362202c5d   edBin2[0], bStil
ffffffc00f0cb4a0:  507463656666416c 6950776569766572   lAffectPreviewPi
ffffffc00f0cb4b0:  305b656e696c6570 62616e4562202c5d   peline[0], bEnab
ffffffc00f0cb4c0:  756f53524444656c 0a0d5d305b656372   leDDRSource[0]..
ffffffc00f0cb4d0:  6e69676e45617265 6f74535f5b203a65   eraEngine: [_Sto
ffffffc00f0cb4e0:  206d616572745370 657250203e2d2d2d   pStream ---> Pre
ffffffc00f0cb4f0:  726f502077656976 61430a0d0a0d5d74   view Port]....Ca
ffffffc00f0cb500:  647261486172656d 6573614265726177   meraHardwareBase
ffffffc00f0cb510:  6c62616e652d203a 6570795467734d65   : -enableMsgType
ffffffc00f0cb520:  300a0d5d0a0d7820 0000000000000000    x..]..0........

...
ffffffc00f0cc530:  e7e7e6e7e7e7e7e7 dcdce1e5e6e6e6e6   ................
ffffffc00f0cc540:  dae2e3d9cac2d7e3 dfe3dec7ccd1dde0   ................
ffffffc00f0cc550:  dbcbc8cac9c5d2d7 e0dfcbc0c9d9dbd9   ................
ffffffc00f0cc560:  d5cec1c2cfdcd8dc cbc7c7c0d4d8d7db   ................
ffffffc00f0cc570:  d0b5bbc0ced3dbd4 dedfdfdedbd6dbdc   ................
ffffffc00f0cc580:  dedededad4d7dede dbd9dcdedfdfdfde   ................
ffffffc00f0cc590:  dddddedddddedddd c3c1c3c9d0d9dddc   ................
ffffffc00f0cc5a0:  dbdbdcdadbdbdad3 d9dadbdad8cbcdd8   ................
ffffffc00f0cc5b0:  dbdbdbdadbd9dada dbdadbdbdbdbdcdb   ................
ffffffc00f0cc5c0:  ccc3d3d9d9dadadb d1d1d1d1d2d1cfcd   ................
ffffffc00f0cc5d0:  d1d1d2d2d1d1d1d0 bfcbd3d5d5d4d3d2   ................
ffffffc00f0cc5e0:  d1c9c3ccd2d0c4c0 b5b5b7b8b8c9d2d3   ................
ffffffc00f0cc5f0:  444f7ca6b1b4b5b4 3a3a363536373b3e   .....|OD>;7656::
ffffffc00f0cc600:  2a2a2b2c2c2b2c2b 2b2c2a2b2a292b2a   +,+,,+***+)*+*,+
ffffffc00f0cc610:  2b2b2a292b2b2a2a 282a2a2a2c2c2b2b   **++)*++++,,***(
ffffffc00f0cc620:  1414161f25282828 1314121312131413   (((%............
ffffffc00f0cc630:  1414131313141411 1918171616151614   ................
ffffffc00f0cc640:  1b19191919181717 28262423211f1d1c   ...........!#$&(
ffffffc00f0cc650:  34353433302f2d2a 3e3d3d3d3c3b3636   *-/0345466;<===>
ffffffc00f0cc660:  333434373a3c3c3d 3132313231303133   =<<:744331012121
ffffffc00f0cc670:  2c2c2d2e2f2e2e2e 2628292929292a2a   .../.-,,**))))(&
ffffffc00f0cc680:  2b2b2b292a292728 2c2d2d2d2d2e2d2c   (‘)*)+++,-.----,
ffffffc00f0cc690:  2a2c2b2d2d2d2d2c 282828292829292b   ,----+,*+))()(((
ffffffc00f0cc6a0:  2626252725252728 2728292828262625   (‘%%‘%&&%&&(()(‘
ffffffc00f0cc6b0:  2c2c2c2c2c2b2928 2d2b2b2b2d2d2d2c   ()+,,,,,,---+++-
ffffffc00f0cc6c0:  2c2b2b2b2c2b2c2c 2c2c2c2b2c2c2a2c   ,,+,+++,,*,,+,,,
ffffffc00f0cc6d0:  2c2c2b2b2c2b2b2c 2e2e2e2e2e2c2c2d   ,++,++,,-,,.....
ffffffc00f0cc6e0:  2f2f2f302e2f2f2e 3b38373432312f2f   .//.0/////12478;
ffffffc00f0cc6f0:  454545444543423f 4340404041444445   ?[email protected]@@C
ffffffc00f0cc700:  4647474847454243 4342424143444446   CBEGHGGFFDDCABBC
ffffffc00f0cc710:  4a48474546464345 8a837a736b635b51   ECFFEGHJQ[cksz..
ffffffc00f0cc720:  9c9c9b9a9695938e 9f9f9f9e9e9d9d9c   ................
ffffffc00f0cc730:  a0a1a09f9f9f9e9e 9f9f9ea0a0a1a0a1   ................
ffffffc00f0cc740:  a09f9e9e9fa0a09e a0a0a0a0a0a0a0a0   ................
ffffffc00f0cc750:  a1a0a0a2a1a1a0a0 a0a0a1a1a0a0a0a0   ................
ffffffc00f0cc760:  a3a2a2a2a1a1a1a1 9fa0a0a0a1a2a1a2   ................
ffffffc00f0cc770:  9f9e9e9e9f9fa09f a1a0a1a09fa09f9e   ................
ffffffc00f0cc780:  9fa0a0a1a1a1a0a0 a0a0a0a1a1a0a09f   ................
ffffffc00f0cc790:  a0a1a1a0a0a0a0a0 9fa0a0a1a0a1a19f   ................
ffffffc00f0cc7a0:  9e9e9f9e9f9ea09f 9b9c9c9c9c9d9e9e   ................

  发现重大线索,附近的内存有camera相关的字符串,而且大部分的内容都是跟之前R0的值很类似的,看着是渐变的数据,camera图像的数据不就是渐变的嘛,于是喊camera的人过来确认,这的确是图像数据,至于为什么会冲掉tty_buffer的memory呢,是因为最近camera那边也enable了mmu,很明显里面是有bug的,至于是啥bug呢,暂时还不清楚,之后查明了在补充吧。

时间: 2024-10-12 21:18:02

kernel panic 分析(camera导致的mem越界)的相关文章

kernel panic 分析(NULL pointer dereference)

It is another typical kernel panic due to invalid address. Panic log: [ 20.896935] c3 554 (netd) Unable to handle kernel NULL pointer dereference at virtual address 00000012 [ 20.906200] c3 554 (netd) pgd = ffffffc02f746000 [ 20.910793] c3 554 (netd)

[Debug]Kernel Panic学习(一)

linux内核调试常见方法 1,可能导致kernel panic的原因有:ARM捕捉到的异常 (KE)          指令异常:程序跑飞,可能跑到数据区里执行          访问无效地址:执行存取指令时抛出异常(访问了kernel space没有映射的内存)代码主动发出的异常 (KE)          调用BUG()/BUG_ON()函数软件卡死导致看门狗复位 (无法调度) (HWT)           代码出现死锁           中断被关太久(中断频繁)硬件卡死导致看门狗复位

深入 kernel panic 流程【转】

一.前言 我们在项目开发过程中,很多时候会出现由于某种原因经常会导致手机系统死机重启的情况(重启分Android重启跟kernel重启,而我们这里只讨论kernel重启也就是 kernel panic 的情况),死机重启基本算是影响最严重的系统问题了,有稳定复现的,也有概率出现的,解题难度也千差万别,出现问题后,通常我们会拿到类似这样的kernel log信息(下面log仅以调用BUG()为例,其它异常所致的死机log信息会有一些不同之处): [ 2.052157] <2>-(2)[1:swa

Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004

移植文件系统时,我们可能会遇到这个问题: VFS: Mounted root (cramfs filesystem) readonly on device 31:3. Freeing unused kernel memory: 176K (c0616000 - c0642000) Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000004 CPU: 0 PID: 1 Comm: sh Not tainted 3.

LFS kernel panic的问题解决之一

/*********************************************************************  * Author  : Samson  * Date    : 04/26/2015  * Test platform:  *              gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2  *              GNU bash, 4.3.11(1)-release (x86_64-pc-linux-gnu)

挂载文件系统出现&quot;kernel panic...&quot; 史上最全解决方案

问:挂载自己制作的文件系统卡在这里: NET: Registered protocol family 1 NET: Registered protocol family 17 VFS: Mounted root (cramfs filesystem) readonly. Freeing init memory: 116K Failed to execute /linuxrc. Attempting defaults... Kernel panic - not syncing: No init f

kernel panic

Linux kernel panic是很难定位和排查的重大故障,一旦系统发生了kernel panic,相关的日志信息非常少,而一种常见的排查方法-重现法–又很难实现,因此遇到kernel panic的问题,一般比较头疼.没有一个万能和完美的方法来解决所有的kernel panic问题,这篇文章仅仅只是给出一些思路,一来如何解决kernel panic的问题,二来可以尽可能减少发生kernel panic的机会.什么是kernel panic 就像名字所暗示的那样,它表示Linux kernel

CentOS kernel panic后自动重启

这段时间公司有几台老化的服务器老是莫名其妙宕机,最后查看日志都是一些类似"I/O error"的错误导致的kernel panic.由于这几台机器跑的也不是什么重要的业务,为了省事想干脆把内核设置为内核崩溃了自动重启.在网上查了下,下面的方法测试有效: 编辑/etc/sysctl.conf 添加kernel.panic到内核参数,为内核崩溃20秒之后,自动重启系统 kernel.panic = 20 设置完成后可以通过以下方法测试,需要修改sysrq参数: 编辑/etc/sysctl.

关于call_rcu在内核模块退出时可能引起kernel panic的问题

http://paulmck.livejournal.com/7314.html RCU的作者,paul在他的blog中有提到这个问题,也明确提到需要在module exit的地方使用rcu_barrier来等待保证call_rcu的回调函数callback能够执行完成,然后再正式卸载模块,方式快速卸载之后call_back回调发现空指针的问题,从而导致kernel panic的问题. RCU and unloadable modules Jun. 8th, 2009 at 1:38 PM Th