MTK Sensor越界导致的系统重启问题分析报告

【NE现场】

打开12306应用后做一些操作,和容易出现系统重启。dropbox中有好多system_server的tombstone文件:

./[email protected]1449222028760.txt:12:pid: 10466, tid: 10493, name: android.bg  >>> system_server <<<
./[email protected]1449455808867.txt:12:pid: 5992, tid: 6053, name: AlarmManager  >>> system_server <<<
./[email protected]1449222028730.txt:12:pid: 10466, tid: 10494, name: ActivityManager  >>> system_server <<<
./[email protected]1449455808843.txt:12:pid: 5992, tid: 6014, name: SensorService  >>> system_server <<<
./[email protected]1449457509508.txt:12:pid: 11012, tid: 11887, name: Binder_E  >>> system_server <<<
./[email protected]1449229865122.txt:12:pid: 18238, tid: 18260, name: SensorService  >>> system_server <<<

可以看到每次crash的线程都不一样!甚至backtrace也不一样:

@[email protected]
pid: 10466, tid: 10493, name: android.bg  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000070
...
backtrace:
    #00 pc 0000000000029f70  /system/lib64/libbinder.so (android::IPCThreadState::flushCommands()+4)
    #01 pc 0000000000009c60  /data/dalvik-cache/arm64/[email protected]@boot.oat
@[email protected]
pid: 5992, tid: 6053, name: AlarmManager  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x553d89e3c0
...
backtrace:
    #00 pc 0000000000030bc4  /system/lib64/libbinder.so (int android::Parcel::writeAligned<int>(int)+80)
    #01 pc 00000000000d3e0c  /system/lib64/libandroid_runtime.so
    #02 pc 0000000000109630  /data/dalvik-cache/arm64/[email protected]@boot.oat
@[email protected]
pid: 10466, tid: 10494, name: ActivityManager  >>> system_server <<<
signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x7f0000000a
...
backtrace:
    #00 pc 0000007f0000000a  <unknown>
@[email protected]
pid: 5992, tid: 6014, name: SensorService  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000068
backtrace:
    #00 pc 000000000000f508  /system/lib64/libsensorservice.so
    #01 pc 0000000000010e90  /system/lib64/libsensorservice.so
    #02 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #03 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #04 pc 0000000000017224  /system/lib64/libutils.so
    #05 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)    #06 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)
@[email protected]
pid: 11012, tid: 11887, name: Binder_E  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0
backtrace:
    #00 pc 0000000000014070  /system/lib64/libutils.so (android::RefBase::decStrong(void const*) const+236)
    #01 pc 000000000002f694  /system/lib64/libbinder.so (android::Parcel::releaseObjects()+84)
    #02 pc 000000000002f6e8  /system/lib64/libbinder.so (android::Parcel::freeDataNoInit()+60)
    #03 pc 000000000002f744  /system/lib64/libbinder.so (android::Parcel::ipcSetDataReference(unsigned char const*, unsigned long, unsigned long long const*, unsigned long, void (*)(android::Parcel*, unsigned char const*, unsigned long, unsigned long long const*, unsigned long, void*), void*)+40)
    #04 pc 000000000002a44c  /system/lib64/libbinder.so (android::IPCThreadState::executeCommand(int)+700)
    #05 pc 000000000002a6c8  /system/lib64/libbinder.so (android::IPCThreadState::getAndExecuteCommand()+92)
    #06 pc 000000000002a73c  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+76)
    #07 pc 0000000000031d68  /system/lib64/libbinder.so
    #08 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #09 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #10 pc 0000000000017224  /system/lib64/libutils.so
    #11 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
    #12 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)
@[email protected]
pid: 18238, tid: 18260, name: SensorService  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7f3d798d38
backtrace:
    #00 pc 000000000000dbc0  /system/lib64/libsensorservice.so
    #01 pc 000000000000ed44  /system/lib64/libsensorservice.so
    #02 pc 000000000000f3a8  /system/lib64/libsensorservice.so
    #03 pc 0000000000011078  /system/lib64/libsensorservice.so
    #04 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #05 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #06 pc 0000000000017224  /system/lib64/libutils.so
    #07 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
    #08 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)

这种backtrace都不一样的问题很可能就是内存问题了,所谓内存问题指的就是野指针或内存越界。

【问题分析】

分析内存问题的第一步就是排查NE现场寄存器指向的内存值附近有没有规律。比如:

@[email protected]
pid: 10466, tid: 10493, name: android.bg  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000070
    x0   000000557e996710  x1   0000000a00000068  x2   0000007f79c31ba0  x3   0000000000000000
    x4   000000006fc46a80  x5   0000000000000001  x6   0000000000000000  x7   000000557e4f527c
    x8   0000000000000000  x9   000000557e4f5278  x10  0000000000000000  x11  0000000000000000
    x12  0000000000000000  x13  0000000000430000  x14  0000000000550000  x15  0000000000430000
    x16  0000007f8f640320  x17  0000007f8eff4f6c  x18  0000007f8c1d0470  x19  000000000000000a
    x20  0000007f8f590e9c  x21  000000557e9b14d0  x22  000000001354bf40  x23  000000006fdf61f0
    x24  0000007f79c31b20  x25  0000000012da2040  x26  0000000000000000  x27  00000000000f52be
    x28  0000000000000000  x29  0000000000358c82  x30  0000000072639c64
    sp   0000007f79c31680  pc   0000007f8eff4f70  pstate 0000000080000000
backtrace:
    #00 pc 0000000000029f70  /system/lib64/libbinder.so (android::IPCThreadState::flushCommands()+4)
    #01 pc 0000000000009c60  /data/dalvik-cache/arm64/[email protected]@boot.oat

查看#00层代码:

$ aarch64-linux-android-objdump -D symbols/system/lib64/libbinder.so
0000000000029f6c <_ZN7android14IPCThreadState13flushCommandsEv>:
   29f6c:       f9400001        ldr     x1, [x0]
=> 29f70:       b9400822        ldr     w2, [x1,#8]
   29f74:       6b1f005f        cmp     w2, wzr
   29f78:       5400006d        b.le    29f84 <_ZN7android14IPCThreadState13flushCommandsEv+0x18>
   29f7c:       52800001        mov     w1, #0x0                        // #0
   29f80:       17ffd9ec        b       20730 <[email protected]>
   29f84:       d65f03c0        ret

x1值是x0地址中取来的,0x0000000a00000068显然不是一个合法地址。可能是x0地址被覆盖了。

查看x0附近的内存值:

memory near x0:
    000000557e9966f0 0000007f8f01e748 0000000000000000
    000000557e996700 0000000000000000 0000000000000007
    000000557e996710 0000000a00000068 0000007f0000000a
    000000557e996720 00000438234cda2a 3de6de183dc20f78
    000000557e996730 000000003d9e2680 0000000000000008
    000000557e996740 0000000000000000 0000000000000000
    000000557e996750 0000000000000000 0000000000000000
    000000557e996760 0000000000010001 0000000000000000
    000000557e996770 000000557e996840 0000000a00000068
    000000557e996780 000000550000000a 000004382647caaa
    000000557e996790 3de6de183dc20f78 000000003d9e2680
    000000557e9967a0 0000000000000000 0000000000000000
    000000557e9967b0 0000000000000000 0000000000000000
    000000557e9967c0 0000007f8e010001 0000000000000000
    000000557e9967d0 0000000000000000 000028e200000040
    000000557e9967e0 0000000a00000068 000000550000000a

发现一个明显的规律:

0x0000000a00000068出现了多次,而且间隔都是13*8=104字节,每个块的结构都很相似,很可能这个数据是个结构体数组。

继续分析下一个tombstone:

@[email protected]
pid: 5992, tid: 6053, name: AlarmManager  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x553d89e3c0
    x0   00000055837f4dc0  x1   0000000000000004  x2   0000000000000440  x3   0000000000020000
    x4   0000000000000444  x5   000000553d89df80  x6   0000000000000000  x7   000000558336b27c
    x8   0000000000000000  x9   000000558336b278  x10  0000000000000000  x11  0000000000000000
    x12  0000000000000000  x13  0000000000430000  x14  0000000000550000  x15  0000000000430000
    x16  0000007f7e502478  x17  0000007f7e4dbb74  x18  0000007f7b6b0470  x19  00000055837f4dc0
    x20  000000008080005c  x21  0000005583838590  x22  000000008080005c  x23  000000006fd15b08
    x24  000000008080005c  x25  000000003000003a  x26  0000000012d5fe80  x27  0000000012c47400
    x28  000000000000005c  x29  0000007f68090bd0  x30  0000007f7ea66e10
    sp   0000007f68090bd0  pc   0000007f7e4dbbc4  pstate 0000000080000000

 backtrace:
    #00 pc 0000000000030bc4  /system/lib64/libbinder.so (int android::Parcel::writeAligned<int>(int)+80)
    #01 pc 00000000000d3e0c  /system/lib64/libandroid_runtime.so
    #02 pc 0000000000109630  /data/dalvik-cache/arm64/[email protected]@boot.oat

查看#00层代码:

$ aarch64-linux-android-objdump -D symbols/system/lib64/libbinder.so
0000000000030b74 <_ZN7android6Parcel12writeAlignedIiEEiT_>:
   30b74:       a9be7bfd        stp     x29, x30, [sp,#-32]!
   30b78:       910003fd        mov     x29, sp
   30b7c:       a90153f3        stp     x19, x20, [sp,#16]
   30b80:       aa0003f3        mov     x19, x0
   30b84:       2a0103f4        mov     w20, w1
   30b88:       f9401002        ldr     x2, [x0,#32]
   30b8c:       f9400c03        ldr     x3, [x0,#24]
   30b90:       91001044        add     x4, x2, #0x4
   30b94:       eb03009f        cmp     x4, x3
   30b98:       54000109        b.ls    30bb8 <_ZN7android6Parcel12writeAlignedIiEEiT_+0x44>
   30b9c:       d2800081        mov     x1, #0x4                        // #4
   30ba0:       97ffbbe4        bl      1fb30 <[email protected]>
   30ba4:       34000080        cbz     w0, 30bb4 <_ZN7android6Parcel12writeAlignedIiEEiT_+0x40>
   30ba8:       a94153f3        ldp     x19, x20, [sp,#16]
   30bac:       a8c27bfd        ldp     x29, x30, [sp],#32
   30bb0:       d65f03c0        ret
   30bb4:       f9401262        ldr     x2, [x19,#32]
   30bb8:       f9400665        ldr     x5, [x19,#8]
   30bbc:       aa1303e0        mov     x0, x19
   30bc0:       d2800081        mov     x1, #0x4                        // #4
=> 30bc4:       b82268b4        str     w20, [x5,x2]
   30bc8:       a94153f3        ldp     x19, x20, [sp,#16]
   30bcc:       a8c27bfd        ldp     x29, x30, [sp],#32
   30bd0:       17ffbf3c        b       208c0 <[email protected]>

NE的原因是x5值非法,而x5是从x19中取出来的,查看x19值:

memory near x19:
    00000055837f4da0 0000000200000004 0000000a00000068
    00000055837f4db0 000000550000000a 0000015263c04a64
    00000055837f4dc0 3d5762703e10ca1c 000000553d89df80
    00000055837f4dd0 0000000000000440 0000000000020000
    00000055837f4de0 0000000000000440 0000000000000000
    00000055837f4df0 0000000000000000 0000000000000000
    00000055837f4e00 0000000000000000 0000000000010001
    00000055837f4e10 0000000a00000068 000000550000000a
    00000055837f4e20 0000015266bb3ae4 3d5762703e10ca1c
    00000055837f4e30 0000007f3d89df80 000000558379c020
    00000055837f4e40 000000558382ea70 0000656c62617300
    00000055837f4e50 0000000000000000 00000000000000d3
    00000055837f4e60 0000007f7eb1dc28 0000000000000000
    00000055837f4e70 0000007f7c8f2348 0000000a00000068
    00000055837f4e80 000000020000000a 0000015269b62b64
    00000055837f4e90 3d5762703e10ca1c 000000003d89df80

被破坏的现场合前面一个现场几乎相同!

后面几个就直接在tombstone里搜索0000000a00000068,发现每一个都是有0000000a00000068, 且间隔都是104字节。

@[email protected]
memory near x1:
    000000557e996a50 0000000a00000068 000000000000000a
    000000557e996a60 000004383b245e2a 3de6de183dc20f78
    000000557e996a70 000000003d9e2680 0000000000000000
    000000557e996a80 0000000000000007 00000000000f8327
    000000557e996a90 0000007f89b05070 0000000000000000
    000000557e996aa0 0000007f79a26000 0000007f00000000
    000000557e996ab0 0000000000000000 0000000a00000068
    000000557e996ac0 000000000000000a 000004383e1f4eaa
    000000557e996ad0 3de6de183dc20f78 000000003d9e2680
    000000557e996ae0 0000000012d94a90 0000000000000000
    000000557e996af0 0000007f79a24000 0000000000103000
    000000557e996b00 0000000000000000 0000000000000000
    000000557e996b10 0000000000000000 0000000000000000
    000000557e996b20 0000000a00000068 000000000000000a
    000000557e996b30 00000438411a3f2a 3de6de183dc20f78
    000000557e996b40 000000553d9e2680 000000557e915720
@[email protected]
memory near x9:
    000000558380c828 000000007614b620 7461003b72656e65
    000000558380c838 0000000000000000 000000558380d530
    000000558380c848 0000000a00000068 000000550000000a
    000000558380c858 0000015d3d53dc64 3d5762703e10ca1c
    000000558380c868 000000003d89df80 0000007f7c8f1fd8
    000000558380c878 000000020000008d 0000000200000004
    000000558380c888 00000000753a770c 0000000000000000
    000000558380c898 0000000000000234 0000000000000000
    000000558380c8a8 0000000000000000 0000000000000000
    000000558380c8b8 0000000000000000 0000007f7c7f9860
    000000558380c8c8 0000000000000101 00000000753a770c
    000000558380c8d8 0000000000000000 0000000000000234
    000000558380c8e8 0000000000000000 0000000000000000
    000000558380c8f8 0000000000000000 000000558337a320
    000000558380c908 0000000000000001 00000000d6000025
    000000558380c918 0000000000000000 0000000000000000
@[email protected]
memory near x19:
    0000005594c52190 3d7e9cb83e33b9be 000000003d567500
    0000005594c521a0 0000000000000000 0000000000000000
    0000005594c521b0 0000000100000000 0000000000000000
    0000005594c521c0 0000000000000000 0000000000000160
    0000005594c521d0 0000000000000000 0000007f9d76b470
    0000005594c521e0 0000000a00000068 000000110000000a
    0000005594c521f0 000001d7e54f0ade 3d7e9cb83e33b9be
    0000005594c52200 000000003d567500 0000000000000000
    0000005594c52210 0000000000000000 0000005594bc6700
    0000005594c52220 0000000000000000 0000000000000000
    0000005594c52230 0000000000000081 0000005500000000
    0000005594c52240 0000007f9e6baf60 0000000a00000068
    0000005594c52250 0000007f0000000a 000001d7e849fb5e
    0000005594c52260 3d7e9cb83e33b9be 0000007f3d567500
    0000005594c52270 0000007f9e6baf00 0000000000110000
    0000005594c52280 0000000000000000 0000000000000000
@[email protected]
memory near x0:
    0000005599644de0 0000000a00000068 000000550000000a
    0000005599644df0 0000066c78c6a111 3d8b65883e0ed844
    0000005599644e00 0000007f3d798d00 0000005599648310
    0000005599644e10 0000005599644ca8 0000005599644cd8
    0000005599644e20 0000000400000004 420ba3d700000000
    0000005599644e30 40c333333c2f4f0e 0000000300000000
    0000005599644e40 0000000000000000 0000000a00000068
    0000005599644e50 000000550000000a 0000066c7bc19191
    0000005599644e60 3d8b65883e0ed844 0000007f3d798d00
    0000005599644e70 0000000000000080 0000000000000053
    0000005599644e80 0000000000000051 0000000000000046
    0000005599644e90 0000005599644ed0 0000007f97c18000
    0000005599644ea0 0000000000000000 0000007f97c18000
    0000005599644eb0 0000000a00000068 000000010000000a
    0000005599644ec0 0000066c7ebc8211 3d8b65883e0ed844
    0000005599644ed0 61642f613d798d00 6361632d6b69766c

至此基本上能确定就是这个大小为104字节,带有0x0000000a00000068这个pattern的结构体数组覆盖正常内存导致的。

下一步就是要确定这个0x0000000a00000068属于哪个结构体。

从tombstone的maps数据中可以知道每次出现问题的地址都是堆内存,因此覆盖和被覆盖的内存也应该都是malloc出来的。

@[email protected]
    ...
    00000055996ba000-0000005599d24fff rw-  6729728  [heap]
    ...

现在我们可以用hook工具hook free函数,然后再free时判断内存里是否有0000000a00000068,如果有这个pattern就打印调用栈。

代码如下:

#if defined(__aarch64__)
extern "C" void free(void* p);
extern "C" void inject_free(void* p) {
    if(p != NULL) {
        size_t* head = (size_t*)((char*)p-sizeof(size_t));    //这里通过heap chunk的head信息得出指针的大小
        size_t size = *head & ~7,i=0;

        int* p_int = (int*)p;
        while(i < size/sizeof(int)) {
            if(*(p_int+i)== 0x00000068 && *(p_int+i+1)==0x0000000a ) {    //这里判断内存中是否有pattern
                LOGD("size=%llu",size);
                size_t j = 0;
                while(j < size/sizeof(int)) {
                    LOGD("p_int[%d]=%08x",j,*(p_int+j));
                    j++;
                }
                dump_java_stack();
                 dump_native_stack();
            }
            i++;
        }
    }
    free(p);  //这里再调用真正的free
}
#endif

打开12306复现问题,发现退出应用时,有如下日志:

# logcat |grep INJECT
D/INJECT  (  912): size=102352
D/INJECT  (  912): p_int[0]=00000068
D/INJECT  (  912): p_int[1]=0000000a
D/INJECT  (  912): p_int[2]=0000000a
D/INJECT  (  912): p_int[3]=00000055
D/INJECT  (  912): p_int[4]=0eaa916c
D/INJECT  (  912): p_int[5]=0000493a
D/INJECT  (  912): p_int[6]=bdaadb00
D/INJECT  (  912): p_int[7]=bcd10f00
D/INJECT  (  912): p_int[8]=be057140
D/INJECT  (  912): p_int[9]=00000000
...
D/INJECT  (  912): p_int[25586]=00007041
D/INJECT  (  912): p_int[25587]=00000000
D/INJECT  (  912): #00 pc 0000000000000a24  /system/lib64/libinjectapis.so (inject_free+236)
D/INJECT  (  912): #01 pc 000000000000fc28  /system/lib64/libsensorservice.so
D/INJECT  (  912): #02 pc 000000000000fcf4  /system/lib64/libsensorservice.so
D/INJECT  (  912): #03 pc 00000000000141b4  /system/lib64/libutils.so (android::RefBase::decStrong(void const*) const+560)
D/INJECT  (  912): #04 pc 0000000000029858  /system/lib64/libbinder.so (android::IPCThreadState::processPendingDerefs()+140)
D/INJECT  (  912): #05 pc 000000000002a734  /system/lib64/libbinder.so (android::IPCThreadState::joinThreadPool(bool)+68)
D/INJECT  (  912): #06 pc 0000000000031d68  /system/lib64/libbinder.so
D/INJECT  (  912): #07 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
D/INJECT  (  912): #08 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
D/INJECT  (  912): #09 pc 0000000000017224  /system/lib64/libutils.so
D/INJECT  (  912): #10 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
D/INJECT  (  912): #11 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)

看来是抓到这个罪魁祸首了!

用addr2line看看是哪个文件哪一行:

$ aarch64-linux-android-addr2line -f -e symbols/system/lib64/libsensorservice.so fc28
_ZN7android13SensorService21SensorEventConnectionD1Ev
/home/mi/disk/2-v6-l-hermes-dev/frameworks/native/services/sensorservice/SensorService.cpp:1021
@frameworks/native/services/sensorservice/SensorService.cpp
SensorService::SensorEventConnection::~SensorEventConnection() {
    ALOGD_IF(DEBUG_CONNECTIONS, "~SensorEventConnection(%p)", this);
    mService->cleanupConnection(this);
    if (mEventCache != NULL) {
        delete mEventCache;
    }
}

就是delete mEventCache语句释放带有0000000a00000068的内存,看看它是被创建的代码:

@frameworks/native/services/sensorservice/SensorService.cpp

status_t SensorService::SensorEventConnection::sendEvents(
        sensors_event_t const* buffer, size_t numEvents,
        sensors_event_t* scratch,
        SensorEventConnection const * const * mapFlushEventsToConnections) {
            ...
            mEventCache = new sensors_event_t[mMaxCacheSize];

这个mEventCache确实是个结构体数组指针。这个new sensors_event_t的定义如下:

@hardware/libhardware/include/hardware/sensors.h

typedef struct sensors_event_t {
    /* must be sizeof(struct sensors_event_t) */
    int32_t version;
    /* sensor identifier */
    int32_t sensor;
    /* sensor type */
    int32_t type;
    /* reserved */
    int32_t reserved0;
    /* time is in nanosecond */
    int64_t timestamp;
    union {
        ...
    }
    /* Reserved flags for internal use. Set to zero. */
    uint32_t flags;
     uint32_t reserved1[3];
} sensors_event_t;

0x0000000a00000068中的68就是version,也就是这个结构体的大小,0x68=104字节,完全符合pattern。

接下来就是排查mEventCache,首先排除野指针,因为复现问题前并没有打印free的log,那只能是内存越界了。

再SensorService.cpp中所有能写mEventCache的地方加了log,发现复现问题时也没有相关log打印出来。

又在原生机器试了一下,发现原生没有这个问题,所以可能不是framework的SensorService代码问题。

sensors_event_t是hardware的头文件里定义的,由次可以判断sensors_event_t这个数据应该是hardware层给framework的buffer里写的。

有是只有MTk机器上出现的问题,所以问题很可能是出在hardware层。

在vendor下搜索sensors_event_t,会发现有很多sensor的实现,不太好定位问题:

vendor$ cgrep sensors_event_t
./mediatek/proprietary/hardware/sensor/InPocket.h:45:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/InPocket.h:50:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Activity.cpp:45:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/Activity.cpp:225:int ActivitySensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/Shake.cpp:46:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/Shake.cpp:216:int ShakeSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/PickUp.h:45:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/PickUp.h:50:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Hwmsen.h:96:    sensors_event_t mPendingEvents[numSensors];
...
./mediatek/proprietary/hardware/sensor/PickUp.cpp:215:int PickUpSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/GlanceGesture.cpp:47:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/GlanceGesture.cpp:208:int GlanceGestureSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/FaceDown.cpp:44:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/FaceDown.cpp:213:int FaceDownSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/AmbienteLight.cpp:62:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/AmbienteLight.cpp:250:int AmbiLightSensor::readEvents(sensors_event_t* data, int count)
./mediatek/proprietary/hardware/sensor/HeartRate.h:44:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/HeartRate.h:49:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Proximity.h:64:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/Proximity.h:69:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/GlanceGesture.h:45:    sensors_event_t mPendingEvent;
./mediatek/proprietary/hardware/sensor/GlanceGesture.h:50:    virtual int readEvents(sensors_event_t* data, int count);
./mediatek/proprietary/hardware/sensor/Proximity.cpp:64:    mPendingEvent.version = sizeof(sensors_event_t);
./mediatek/proprietary/hardware/sensor/Proximity.cpp:253:int ProximitySensor::readEvents(sensors_event_t* data, int count)

0x0000000a00000068中的0xa是sensors_event_t结构中的sensor的值,这个看起来是sensor ID,这些ID是在device下定义的。

@device/mediatek/common/kernel-headers/linux/hwmsensor.h
#define SENSOR_TYPE_LINEAR_ACCELERATION 10
#define SENSOR_TYPE_ROTATION_VECTOR     11
...
#define SENSOR_TYPE_SIGNIFICANT_MOTION  17
#define SENSOR_TYPE_STEP_DETECTOR       18
...
#define ID_BASE                         0
...
#define ID_ACCELEROMETER               (ID_BASE+SENSOR_TYPE_ACCELEROMETER-1)
#define ID_LINEAR_ACCELERATION         (ID_BASE+SENSOR_TYPE_LINEAR_ACCELERATION-1)
#define ID_ROTATION_VECTOR             (ID_BASE+SENSOR_TYPE_ROTATION_VECTOR-1)
#define ID_GRAVITY                     (ID_BASE+SENSOR_TYPE_GRAVITY-1)
#define ID_GYROSCOPE                   (ID_BASE+SENSOR_TYPE_GYROSCOPE-1)

这个ID是ID_ROTATION_VECTOR,这样可以进一步缩小范围

vendor$ cgrep ID_ROTATION_VECTOR
./mediatek/proprietary/hardware/sensor/nusensors.cpp:145:            case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/hwmsen_chip_info.c:766:            .handle     = ID_ROTATION_VECTOR+ID_OFFSET,
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:207:        case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:265:        case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:352:            case ID_ROTATION_VECTOR:
./mediatek/proprietary/hardware/sensor/BstSensor.cpp:411:            case ID_ROTATION_VECTOR:

这里由于对sensors_event_t不熟悉,所以后面对ID的判断其实是很虚的,所以没有继续往下跟。决定直接上coredump。

由于此时已经找到了稳定的复现路径,所以抓取coredump也容易了,一共抓了5组coredump。

对每一份做了内存分析,发现都是大量的数据被覆盖,至少是几百K,

而且看起来出问题的时候还再写(出问题时结束位置——104字节对齐不明显),起始位置也没有堆chunk的head。

其中有一个对应的tombstone很可疑:

pid: 15788, tid: 15810, name: SensorService  >>> system_server <<<
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xa00000080
backtrace:
    #00 pc 00000000000096a0  /system/lib64/hw/sensors.mt6795.so (sensors_poll_context_t::pollEvents(sensors_event_t*, int)+400)
    #01 pc 000000000000a474  /system/lib64/libsensorservice.so
    #02 pc 0000000000010eb0  /system/lib64/libsensorservice.so
    #03 pc 00000000000179c0  /system/lib64/libutils.so (android::Thread::_threadLoop(void*)+188)
    #04 pc 000000000009277c  /system/lib64/libandroid_runtime.so (android::AndroidRuntime::javaThreadShell(void*)+96)
    #05 pc 0000000000017224  /system/lib64/libutils.so
    #06 pc 000000000001cbb0  /system/lib64/libc.so (__pthread_start(void*)+52)
    #07 pc 0000000000019044  /system/lib64/libc.so (__start_thread+16)

这个backtrace #00的sensors.mt6795.so库就是sensor hardware的代码,而当前还是确实也在操作sensors_event_t结构的指针。

加载该tombstone对应的coredump:

(gdb) core core-SensorService-15788

(gdb) bt
#0  0x0000007f6a79d6a0 in sensors_poll_context_t::pollEvents (this=0x5575a43b20, data=0x5575a60c78, count=-653) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:470
#1  0x0000007f6a7e6478 in android::SensorDevice::poll ([email protected]=0x5575a863a0, buffer=0x5575a49b30, [email protected]=256) at frameworks/native/services/sensorservice/SensorDevice.cpp:123
#2  0x0000007f6a7eceb4 in android::SensorService::threadLoop (this=0x5575a43880) at frameworks/native/services/sensorservice/SensorService.cpp:409
#3  0x0000007f7f99c9c4 in android::Thread::_threadLoop ([email protected]=0x5575a438a0) at system/core/libutils/Threads.cpp:776
#4  0x0000007f7fc9f780 in android::AndroidRuntime::javaThreadShell (args=<optimized out>) at frameworks/base/core/jni/AndroidRuntime.cpp:1258
#5  0x0000007f7f99c228 in thread_data_t::trampoline (t=<optimized out>) at system/core/libutils/Threads.cpp:101
#6  0x0000007f7fe56bb4 in __pthread_start (arg=0x7f7ba4e000, [email protected]=<error reading variable: value has been optimized out>) at bionic/libc/bionic/pthread_create.cpp:141
#7  0x0000007f7fe53048 in __start_thread (fn=<optimized out>, arg=<optimized out>) at bionic/libc/bionic/clone.cpp:41
#8  0x0000000000000000 in ?? ()

backtrace中,#0里的count是-653,而#1的count值是256。

也 就是说android::SensorDevice::poll()调用sensors_poll_context_t::pollEvents() 时,count还是256,但在 sensors_poll_context_t::pollEvents()做处理时变成了负数。

一般count不应该是负数,所以很可能 sensors_poll_context_t::pollEvents()的逻辑有问题,

加上这个函数又是上面列出的重点怀疑的文件nusensors.cpp中定义的。所以这里出问题的可能性很大。

@vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp

int sensors_poll_context_t::pollEvents(sensors_event_t* data, int count)
{
    int nbEvents = 0;
    int n = 0;
    //ALOGE("pollEvents count =%d",count );
    do {
        // see if we have some leftover from the last poll()
        for (int i=0 ; count && i<numSensorDrivers ; i++) {
            SensorBase* const sensor(mSensors[i]);
            if ((mPollFds[i].revents & POLLIN) || (sensor->hasPendingEvents())) {
                int nb = sensor->readEvents(data, count);
                ...
                //if(nb < 0||nb > count)
                    //ALOGE("pollEvents count error nb:%d, count:%d, nbEvents:%d", nb, count, nbEvents);//for sensor NE debug
                count -= nb;
                nbEvents += nb;
                data += nb;
                //if(nb < 0||nb > count)
                //    ALOGE("pollEvents count error nb:%d, count:%d, nbEvents:%d", nb, count, nbEvents);//for sensor NE debug
            }
        }
        ...
    } while (n && count);
    return nbEvents;
}

关键是下面这一句:

nb = sensor->readEvents(data, count);

这句话的意思应该是往data指向的地址中填写event,count是要读取的event个数,nb返回的是已读取的event个数。

这种方法类似于read()函数的,while循环能保证读取到count个数的event。

而这时如果nb返回异常呢?当然,从gdb中可以确定nb确实异常了。

然而我们sensors_poll_context_t::pollEvents()中没有对这个异常值做任何的保护,所以就会出问题。

当count为负数时sensors_poll_context_t::pollEvents()的while循环很难会退出,所以越界覆盖的buffer也应该很长。

当覆盖的内存刚好被别的线程访问,那个线程就会crash,而此时应该有线程在执行sensors_poll_context_t::pollEvents()函数。

再看看其他core:

(gdb) core core-InputReader-32381

(gdb) bt
#0  art::Mutex::ExclusiveLock ([email protected]=0xa00000068, [email protected]=0x5597cc1f80) at art/runtime/base/mutex.cc:311
#1  0x0000007fafb410e0 in MutexLock (mu=..., self=0x5597cc1f80, this=<synthetic pointer>) at art/runtime/base/mutex.h:423
#2  art::gc::allocator::RosAlloc::AllocFromRun (this=0x5597c1efe0, [email protected]=0x5597cc1f80, [email protected]=20, bytes_allocated=0x75f8c7f8, [email protected]=0x7f9c42f088)
    at art/runtime/gc/allocator/rosalloc.cc:672
#3  0x0000007fafd57650 in Alloc<true> (bytes_allocated=0x7f9c42f088, size=20, self=0x5597cc1f80, this=<optimized out>) at art/runtime/gc/allocator/rosalloc-inl.h:33
#4  AllocCommon<true> (usable_size=0x7f9c42f080, bytes_allocated=0x7f9c42f078, num_bytes=20, self=0x5597cc1f80, this=<optimized out>) at art/runtime/gc/space/rosalloc_space-inl.h:57
#5  AllocNonvirtual (usable_size=0x7f9c42f080, bytes_allocated=0x7f9c42f078, num_bytes=20, self=0x5597cc1f80, this=<optimized out>) at art/runtime/gc/space/rosalloc_space.h:71
#6  TryToAllocate<false, false> (usable_size=0x7f9c42f080, bytes_allocated=0x7f9c42f078, alloc_size=20, allocator_type=art::gc::kAllocatorTypeRosAlloc, self=0x5597cc1f80, this=<optimized out>)
    at art/runtime/gc/heap-inl.h:208
#7  AllocObjectWithAllocator<false, false, art::VoidFunctor> (pre_fence_visitor=..., allocator=art::gc::kAllocatorTypeRosAlloc, byte_count=20, klass=0x6fce4650, self=0x5597cc1f80, this=<optimized out>)
    at art/runtime/gc/heap-inl.h:80
#8  Alloc<false, false> (allocator_type=art::gc::kAllocatorTypeRosAlloc, self=0x5597cc1f80, this=<optimized out>) at art/runtime/mirror/class-inl.h:561
#9  AllocObjectFromCodeInitialized<false> (method=<optimized out>, allocator_type=art::gc::kAllocatorTypeRosAlloc, self=0x5597cc1f80, klass=<optimized out>)
    at art/runtime/entrypoints/entrypoint_utils-inl.h:170

上面这个是被覆盖的线程,可以用info thread查看同一时刻其他线程的状态:

(gdb) info thread
  Id   Target Id         Frame
  100  LWP 1525          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  99   LWP 32456         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  98   LWP 1586          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  97   LWP 581           nanosleep () at bionic/libc/arch-arm64/syscalls/nanosleep.S:7
  96   LWP 2156          syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  95   LWP 32465         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  94   LWP 32464         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  ...
  21   LWP 32483         __accept4 () at bionic/libc/arch-arm64/syscalls/__accept4.S:7
  20   LWP 32444         recvmsg () at bionic/libc/arch-arm64/syscalls/recvmsg.S:7
  19   LWP 32403         sensors_poll_context_t::pollEvents (this=0x5597c20eb0, data=0x5597c21ca0, count=-402) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:470
  18   LWP 32413         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  ...
  5    LWP 1106          __ppoll () at bionic/libc/arch-arm64/syscalls/__ppoll.S:7
  4    LWP 923           __ioctl () at bionic/libc/arch-arm64/syscalls/__ioctl.S:7
  3    LWP 32441         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  2    LWP 32467         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
* 1    LWP 32442         art::Mutex::ExclusiveLock ([email protected]=0xa00000068, [email protected]=0x5597cc1f80) at art/runtime/base/mutex.cc:311

上面19号线程可以看出,它正在执行sensors_poll_context_t::pollEvents(),同时count值也是负数。

再看一个:

(gdb) core  core-SensorEventAckR-15999

(gdb) bt
#0  android::Looper::pollInner (this=0x558c9e0bd0, timeoutMillis=<optimized out>) at system/core/libutils/Looper.cpp:286
#1  0x0000007f9e1cc884 in android::Looper::pollOnce (this=0x558c9e0bd0, [email protected]=-1, [email protected]=0x0, [email protected]=0x0, [email protected]=0x0)
    at system/core/libutils/Looper.cpp:191
#2  0x0000007f89014708 in pollOnce (timeoutMillis=-1, this=<optimized out>) at system/core/include/utils/Looper.h:265
#3  android::SensorService::SensorEventAckReceiver::threadLoop (this=0x7f88fbf9e8) at frameworks/native/services/sensorservice/SensorService.cpp:559
#4  0x0000007f9e1c89c4 in android::Thread::_threadLoop ([email protected]=0x7f88fbf9e8) at system/core/libutils/Threads.cpp:776
#5  0x0000007f9e4cb780 in android::AndroidRuntime::javaThreadShell (args=<optimized out>) at frameworks/base/core/jni/AndroidRuntime.cpp:1258
#6  0x0000007f9e1c8228 in thread_data_t::trampoline (t=<optimized out>) at system/core/libutils/Threads.cpp:101
#7  0x0000007f9e682bb4 in __pthread_start (arg=0x7f95f99000, [email protected]=<error reading variable: value has been optimized out>) at bionic/libc/bionic/pthread_create.cpp:141
#8  0x0000007f9e67f048 in __start_thread (fn=<optimized out>, arg=<optimized out>) at bionic/libc/bionic/clone.cpp:41
#9  0x0000000000000000 in ?? ()
(gdb) info thread
  Id   Target Id         Frame
  101  LWP 17316         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  100  LWP 17588         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  99   LWP 16075         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  98   LWP 16080         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  97   LWP 20823         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  96   LWP 16082         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  95   LWP 19034         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  ...
  11   LWP 16036         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  10   LWP 16392         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  9    LWP 16021         0x0000007f88fc96a0 in sensors_poll_context_t::pollEvents (this=0x558c5cbe40, data=0x558ca1fcc0, count=-3472) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:470
  8    LWP 16107         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  7    LWP 16031         __epoll_pwait () at bionic/libc/arch-arm64/syscalls/__epoll_pwait.S:9
  6    LWP 16034         0x0000007f9e738104 in ?? ()
  5    LWP 16104         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  4    LWP 16061         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  3    LWP 16060         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
  2    LWP 16085         syscall () at bionic/libc/arch-arm64/bionic/syscall.S:41
* 1    LWP 16024         android::Looper::pollInner (this=0x558c9e0bd0, timeoutMillis=<optimized out>) at system/core/libutils/Looper.cpp:286

加入debug code:

int sensors_poll_context_t::pollEvents(sensors_event_t* data, int count)
{
    int nbEvents = 0;
    int n = 0;
    //ALOGE("pollEvents count =%d",count );do {
        // see if we have some leftover from the last poll()
        for (int i=0 ; count && i<numSensorDrivers ; i++) {
            SensorBase* const sensor(mSensors[i]);
            if ((mPollFds[i].revents & POLLIN) || (sensor->hasPendingEvents())) {
                int nb = sensor->readEvents(data, count);
                if(nb < 0 || nb > count) {
                    nb += *(int*)0;    //运行到这里时,触发native crash
                }
                ...

重新抓取coredump:

(gdb) bt
#0  sensors_poll_context_t::pollEvents (this=0x55827f7610, data=0x5582c7b170, count=256) at vendor/mediatek/proprietary/hardware/sensor/nusensors.cpp:474
#1  0x0000007f92352478 in android::SensorDevice::poll (this=this@entry=0x5582be0ee0, buffer=0x5582c7b170, [email protected]=256) at frameworks/native/services/sensorservice/SensorDevice.cpp:123
#2  0x0000007f92358eb4 in android::SensorService::threadLoop (this=0x55827f7370) at frameworks/native/services/sensorservice/SensorService.cpp:409
#3  0x0000007fa75089c4 in android::Thread::_threadLoop ([email protected]=0x55827f7390) at system/core/libutils/Threads.cpp:776
#4  0x0000007fa780b780 in android::AndroidRuntime::javaThreadShell (args=<optimized out>) at frameworks/base/core/jni/AndroidRuntime.cpp:1258
#5  0x0000007fa7508228 in thread_data_t::trampoline (t=<optimized out>) at system/core/libutils/Threads.cpp:101
#6  0x0000007fa79c2bb4 in __pthread_start (arg=0x7fa35b9000, [email protected]=<error reading variable: value has been optimized out>) at bionic/libc/bionic/pthread_create.cpp:141
#7  0x0000007fa79bf048 in __start_thread (fn=<optimized out>, arg=<optimized out>) at bionic/libc/bionic/clone.cpp:41
#8  0x0000000000000000 in ?? ()

查看变量:

(gdb) info locals
nb = 2235
sensor = <optimized out>
i = 5
nbEvents = 0
n = 1

nb值(2235)确实比count值(256)大!

再看看是哪个sensor:

(gdb) p mSensors

$2 = {0x5582c2db80, 0x55827f80c0, 0x5582c22dd0, 0x5582c23ec0, 0x5582c27fa0, 0x5582be02b0, 0x5582c268b0, 0x5582c251c0, 0x5582c296a0, 0x5582c2ada0, 0x5582c2c490, 0x5582c30560, 0x5582c33340, 0x5582c31c50,

  0x5582c70a60, 0x5582c71b40, 0x5582c73230, 0x5582c74920, 0x5582c76010, 0x5582c77700, 0x5582c78df0}

i值是5,那sensor就是0x5582be02b0了。

(gdb) p *(SensorBase*) 0x5582be02b0

$3 = {_vptr.SensorBase = 0x7f923317c0 <vtable for BstSensor+16>, dev_name = 0x0, data_name = 0x7f9231c698 "/data/misc/sensor/fifo_dat", dev_fd = -1, data_fd = 40}

找到虚函数表指针为0x7f923317c0,看下虚函数表中的内容:

(gdb) x /16gx 0x7f923317c0
0x7f923317c0 <_ZTV9BstSensor+16>:    0x0000007f92317e58    0x0000007f92317e94
0x7f923317d0 <_ZTV9BstSensor+32>:    0x0000007f923181a4    0x0000007f92309ab8
0x7f923317e0 <_ZTV9BstSensor+48>:    0x0000007f92309ab0    0x0000007f92317eb8
0x7f923317f0 <_ZTV9BstSensor+64>:    0x0000007f92318020    0x0000007f92317e18
0x7f92331800 <_ZTV9BstSensor+80>:    0x0000007f92317e20    0x0000000000000001
0x7f92331810:    0x000000000000283c    0x0000000000000001
0x7f92331820:    0x0000000000002844    0x0000000000000001
0x7f92331830:    0x0000000000002851    0x0000000000000001

可以知道,这个sensor类是BstSensor。又挨个试得知对应的readEvents为0x0000007f923181a4。

(gdb) disassemble 0x0000007f923181a4
Dump of assembler code for function BstSensor::readEvents(sensors_event_t*, int):
   0x0000007f923181a4 <+0>:    stp    x29, x30, [sp,#-176]!
   0x0000007f923181a8 <+4>:    cmp    w2, wzr
   0x0000007f923181ac <+8>:    mov    x29, sp
   0x0000007f923181b0 <+12>:    stp    x19, x20, [sp,#16]

看源码:

@vendor/mediatek/proprietary/hardware/sensor/BstSensor.cpp

int BstSensor::readEvents(sensors_event_t *pdata, int count)
{
        ...while (rslt < count) {

                err = read(data_fd, &sensor_data, sizeof(sensor_data));
                ...
                delta_mod= (float)(time-bst_last_ts[index])/(float)(bst_delay[index]);

                /* if delta_mode > x.5, loopcount = x+1 */
                loopcount = (int)(delta_mod+0.5);
                ...for(int i=1; i<=loopcount; i++) {

                    pdata_cur->version = sizeof(*pdata_cur);
                    pdata_cur->sensor = sensor;
                    pdata_cur->timestamp = time - (loopcount-i)*bst_delay[index];

                    switch (sensor) {
                    ...case ID_ROTATION_VECTOR:
                            pdata_cur->data[0] = sensor_data.data.data[0];
                            pdata_cur->data[1] = sensor_data.data.data[1];
                            pdata_cur->data[2] = sensor_data.data.data[2];
                            pdata_cur->data[3] = sensor_data.data.data[3];
                            pdata_cur->data[4] = sensor_data.data.data[4];
                            pdata_cur->type = SENSOR_TYPE_ROTATION_VECTOR;
                            break;
                    default:
                            ALOGE("<BST> " "Invalid data pkt");
                            return rslt;
                    }

                    rslt++;
                    pdata_cur++;
                }
                bst_last_ts[index] = time;
        }

        return rslt;
}

关键是这个loopcount大于传入buffer的大小count的时候,返回值有可能大于count。这里可能需要做保护!

【解决方案】

diff --git a/proprietary/hardware/sensor/BstSensor.cpp b/proprietary/hardware/sensor/BstSensor.cpp
index cc21b37..ea84bea 100755
--- a/proprietary/hardware/sensor/BstSensor.cpp
+++ b/proprietary/hardware/sensor/BstSensor.cpp
@@ -320,7 +320,9 @@ int BstSensor::readEvents(sensors_event_t *pdata, int count)
         sensors_event_t *pdata_cur;
         int loopcount;
         float delta_mod;
+        int count_bk;

+        count_bk = count;
         if (count > 1) {
                 count = 1;
         }
@@ -388,7 +390,10 @@ int BstSensor::readEvents(sensors_event_t *pdata, int count)
                         bst_last_ts[index] = time;
                         loopcount = 1;
                 }
-
+                if(loopcount >=count_bk)
+                {
+                        loopcount = count_bk;
+                }
                 for(int i=1; i<=loopcount; i++) {
时间: 2024-10-12 16:18:33

MTK Sensor越界导致的系统重启问题分析报告的相关文章

Android5.0L中SensorService crash导致的systemserver重启问题分析

一.初步分析结论 sensorservice多线程机制存在问题,导致在disable accel sensor并释放相应内存和数据之后, 有很小的概率发生继续读取到未处理完的sensor事件,从而继续使用相应的内存和数据, 并且没有做相应的防御保护措施,最终引起指针地址操作错误. 二.解决方案 1.首先在可能发生错误的地方做好防御保护措施 2.对多线程进行同步,对于临界变量的操作都放置到临界区中,使用锁来保护. 三.具体分析过程 log中显示打出accel sensor被disable的信息,然

高通sensor库和Linker的死锁问题分析报告

[问题描述] 调试NativeHeap泄露时,我们会用到Android Native Heap调试框架. 在push libc_malloc_debug_leak.so后重启zygote(adb shell stop + adb shell start),会发现系统一直起不来. 用debuggerd打印system_server的调用栈,可以发现system server的大部分线程都在malloc函数里卡死: pid: 4918, tid: 4929, name: Binder_1 >>&g

Linux漏洞引起zygote重启问题分析报告

[问题现象] 启动<神马搜索>APP,系统高概率重启. [分析问题]main日志中,除了app的NE日志合zygote重启日志外,无其他明显的异常: 11-05 15:14:51.824 11179 11179 I DEBUG : pid: 23631, tid: 23693, name: ucsdk_setup >>> com.uc.searchbox <<< 11-05 15:14:51.824 11179 11179 I DEBUG : signal

三星笔记本R428安装xp win7双系统,切换系统重启才能进入系统解决办法。

三星笔记本 XP win7 双系统切换重启解决方法 三星笔记本有个奇怪的现象,就是装有XP和win7双系统    xp切换到win7.进系统是会重启一次,并且bios回复光驱为第一启动项,win7切换到XP也是一样.但是如果一直只用其中的一个系统时则没有重启的现象. 经过两天的实践我的三星R428本本终于可以一次就切换成功.现将我的经验与大家分享,希望能帮到有需要的童鞋 重启的原因如下: Bios中 advance下的AHCI Mode Control(仅供参考,不同品牌会不一样) 设置为AUT

服务器硬件问题导致虚拟机自动重启

环境:Esxi虚拟化 宿主机上面跑两台机器(20.11,21.12),插两块300G的SATA硬盘 现象:监控页面在昨天半夜到今早经常出现空缺部分(感觉应该是机器重启了): 现象如下 排查: 1.首先在21.12这台机器上使用last命令查看重启情况(没来得及截图)但是确实是系统重启过 2.查看/var/log/messages日志,锁定8:27分的日志(也就是重启的时间段) 单从日志信息上看cpu不支持变频的问题,由于在操作系统和VCS日志中均没有发现其他异常,因此怀疑是服务器硬件出了问题,去

Unix系统重启时必须注意的事项

Unix系统重启时必须注意的事项 对于系统管理员来说如何管理自己的服务器已经是再简单不过,但是如何管理好服务器却不是一个简单的事情.对于Windows服务器管理员来说经常性重启Windows设备已经成为一种生活常态,但在Unix系统中这可不是常态的事,在默认情况下重新启动不会带来任何形式的改善. 对于每一位服务器管理员来说这都算得上热门话题,但在Unix极客们眼中它则属于一种层次更深的课题--可能因为Windows管理员们往往把重启当成故障排查工作的首要步骤之一,而Unix团队则一般只在束手无策

解决虚拟内存设置错误导致的系统蓝屏无法启动问题

一次偶然设置虚拟内存由于设置过大导致系统重启后蓝屏,进入无限系统修复界面,但怎么修复都无法正常进入系统,修复过程如下: 首先得有个Ghost 系统U盘,制作方法百度. 然后开机进入U盘引导,进入Ghost系统 ①  桌面→更多工具→设置虚拟内存→初始大小500最大值设置成和物理内存一致即可 ②  桌面→修复系统引导 ③  重启进入系统,重点:然后再重新设置一下虚拟内存.

ovs2.7 在系统重启后,再次使用时提示数据库无法连接的问题。

问题现象如下,ovs开始安装后,对ovs的操作是正常的,但是,现在系统重启后,OVS的操作第一条命令就失败,如下: 问题解决方法: 参考  http://blog.csdn.net/xyq54/article/details/51371819 问题根源是ovs 需要 the ovsdb, ovs-vswitchd, ovs-vsctl, 但是关机后它们会默认关闭 [解决方法]系统启动后,输入OVS命令前先输入如下命令:

使用虚拟机克隆CentOS 6.9系统重启网卡报错问题的解决

使用虚拟机克隆CentOS6.9系统重启网卡报错问题的解决 1.错误信息 Bringing up interface eth0:  Device eth0 does not seem to be present,delaying initialization.                    [FAILED] 2.解决方法 (1)配置IP地址,重启网卡,出现如下报错 (2)这是因为克隆后的系统和原系统MAC地址和UUID一样,删除UUID和MAC地址 (3)删除网卡相关信息的文件 (4)重