【问题描述】
第三方优质应用《拓词》打开就停止运行,不管是什么版本的系统和什么版本的拓词。
出现问题时,系统没有生成tombstone文件,只有main.log中有如下信息:
pid: 17241, tid: 17276, name: Thread-413 >>> com.towords <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0000001c
【分析步骤】
发现每次拓词crash时debuggerd进程也会一起crash,所以才不会生成调用栈。
所以先得看看debuggerd为什么会挂掉,首先查看debuggerd crash时的core:
(gdb) bt #0 load_symbol_table ([email protected]=0x411ae05c "/data/data/com.towords/files/libprotectClass.so") at system/core/libcorkscrew/symbol_table.c:94 #1 0x401039fe in load_ptrace_map_info_data (mi=0x411ae048, pid=<optimized out>) at system/core/libcorkscrew/ptrace.c:96 #2 load_ptrace_context ([email protected]=4486) at system/core/libcorkscrew/ptrace.c:112 ...
查看源码:
@system/core/libcorkscrew/symbol_table.c symbol_table_t* load_symbol_table(const char *filename) { symbol_table_t* table = NULL; int fd = open(filename, O_RDONLY); //打开/data/data/com.towords/files/libprotectClass.so struct stat sb; size_t length = sb.st_size; char* base = mmap(NULL, length, PROT_READ, MAP_PRIVATE, fd, 0); //映射到内存空间中 Elf32_Ehdr *hdr = (Elf32_Ehdr*)base; Elf32_Shdr *shdr = (Elf32_Shdr*)(base + hdr->e_shoff); //获取SectionHeader的偏移 int sym_idx = -1; int dynsym_idx = -1; for (Elf32_Half i = 0; i < hdr->e_shnum; i++) { if (shdr[i].sh_type == SHT_SYMTAB) { //<<<< 查找symboltable sym_idx = i; }
debuggerd在读取libprotectClass.so的symboltable的时候下标i越界了。
(gdb) disassemble Dump of assembler code for function load_symbol_table: 0x4012cabc <+0>: stmdb sp!, {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0x4012cac0 <+4>: movs r1, #0 0x4012cac2 <+6>: sub sp, #148 ; 0x94 ... 0x4012cb14 <+88>: mla r4, r1, r0, r7 => 0x4012cb18 <+92>: ldr r3, [r4, #4]
从r4+4的地址取值时FC的,查看r4的值:
(gdb) info reg r4 r0 0x1d 29 r1 0x28 40 r2 0x0 0 r3 0x0 0 r4 0x4016b00c 1075228684 r5 0x4013b000 1075032064 r6 0x1 1 r7 0x4016ab84 1075227524 r8 0xffffffff 4294967295 r9 0x1 1 r10 0x1 1 r11 0x4005b6c0 1074116288 r12 0x66 102 sp 0xbebe4f88 0xbebe4f88 lr 0x40096cef 1074359535 pc 0x40103b18 0x40103b18 <load_symbol_table+92> cpsr 0x80010030 -2147418064
这个值刚好是页边界,很可能是访问越界了,估计ELF的头信息被篡改了。
用readelf查看这个libprotectClass.so的头信息:
ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2‘s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: ARM Version: 0x1 Entry point address: 0x0 Start of program headers: 52 (bytes into file) Start of section headers: 195460 (bytes into file) Flags: 0x5000000, Version5 EABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 7 Size of section headers: 108 (bytes) Number of section headers: 102
最后两个值section header大小和个数异常,且少了一个section header string table index。
一般正常的elf头信息如下:
ELF Header: Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00 Class: ELF32 Data: 2‘s complement, little endian Version: 1 (current) OS/ABI: UNIX - System V ABI Version: 0 Type: DYN (Shared object file) Machine: ARM Version: 0x1 Entry point address: 0xd4c Start of program headers: 52 (bytes into file) Start of section headers: 8568 (bytes into file) Flags: 0x5000000, Version5 EABI Size of this header: 52 (bytes) Size of program headers: 32 (bytes) Number of program headers: 8 Size of section headers: 40 (bytes) Number of section headers: 25 Section header string table index: 24
显然是为了防止elf文件被破解,人为的破坏了elf的头信息,这样很多反汇编工具就无法正常解析这个elf问价了。
为了能正确打印调试信息,需要把这个头信息改回正确值:
真正的number of section header的值可以通过:
so文件大小(0x2ff1c)减去Start of section headers值(0x2fb84),再除以Size of section headers值0x28(40)即获得。
(gdb) p /x (0x2ff1c-0x2fb84)/0x28 $19 = 0x17
Section header string table index值一般是number of section header减一,这里就死0x16。
通过二进制编辑器将libprotectClass.so文件里的对应位改掉即可。
修改前:
7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 03 00 28 00 01 00 00 00 00 00 00 00 34 00 00 00 84 FB 02 00 00 00 00 05 34 00 20 00 07 00 6C 00 66 00 78 00 06 00 00 00 34 00 00 00 34 00 00 00
修改后:
7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 03 00 28 00 01 00 00 00 00 00 00 00 34 00 00 00 84 FB 02 00 00 00 00 05 34 00 20 00 07 00 28 00 17 00 16 00 06 00 00 00 34 00 00 00 34 00 00 00
push到手机里后,重启复现问题,发现debuggerd还是会crash,而且调用栈一模一样。
推断可能是程序启动的时候,自己改写这个so库。因此用chmod 555 libprotectClass.so命令把这个库的写权限给去掉。
再重启复现问题,发现debuggerd不再crash,也会生成libprotectClass.so的调用栈,coredump文件、maps文件等调试信息。
同时,mail.log里多了如下警告信息:
08-06 21:35:04.303 5299 5299 W System.err: java.io.FileNotFoundException: /data/data/com.towords/files/libprotectClass.so: open failed: EACCES (Permission denied) 08-06 21:35:04.305 5299 5299 W System.err: at libcore.io.IoBridge.open(IoBridge.java:409) 08-06 21:35:04.305 5299 5299 W System.err: at java.io.FileOutputStream.<init>(FileOutputStream.java:88) 08-06 21:35:04.305 5299 5299 W System.err: at java.io.FileOutputStream.<init>(FileOutputStream.java:128) 08-06 21:35:04.306 5299 5299 W System.err: at java.io.FileOutputStream.<init>(FileOutputStream.java:117) 08-06 21:35:04.306 5299 5299 W System.err: at com.qihoo.util.StubApplication.copy(StubApplication.java:217) 08-06 21:35:04.306 5299 5299 W System.err: at com.qihoo.util.StubApplication.attachBaseContext(StubApplication.java:147) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.Application.attach(Application.java:185) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.Instrumentation.newApplication(Instrumentation.java:991) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.Instrumentation.newApplication(Instrumentation.java:975) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.LoadedApk.makeApplication(LoadedApk.java:504) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.ActivityThread.handleBindApplication(ActivityThread.java:4314) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.ActivityThread.access$1500(ActivityThread.java:138) 08-06 21:35:04.306 5299 5299 W System.err: at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1261) 08-06 21:35:04.306 5299 5299 W System.err: at android.os.Handler.dispatchMessage(Handler.java:102) 08-06 21:35:04.307 5299 5299 W System.err: at android.os.Looper.loop(Looper.java:136) 08-06 21:35:04.307 5299 5299 W System.err: at android.app.ActivityThread.main(ActivityThread.java:5016) 08-06 21:35:04.307 5299 5299 W System.err: at java.lang.reflect.Method.invokeNative(Native Method) 08-06 21:35:04.307 5299 5299 W System.err: at java.lang.reflect.Method.invoke(Method.java:515) 08-06 21:35:04.307 5299 5299 W System.err: at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:792) 08-06 21:35:04.307 5299 5299 W System.err: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:608) 08-06 21:35:04.307 5299 5299 W System.err: at dalvik.system.NativeStart.main(Native Method) 08-06 21:35:04.307 5299 5299 W System.err: Caused by: libcore.io.ErrnoException: open failed: EACCES (Permission denied) 08-06 21:35:04.308 5299 5299 W System.err: at libcore.io.Posix.open(Native Method) 08-06 21:35:04.308 5299 5299 W System.err: at libcore.io.BlockGuardOs.open(BlockGuardOs.java:110) 08-06 21:35:04.308 5299 5299 W System.err: at libcore.io.IoBridge.open(IoBridge.java:393) 08-06 21:35:04.309 5299 5299 W System.err: ... 20 more
很明显,程序确实在启动的时候再改写这个libprotectClass.so文件,由于是W的log,即使不让它写也不会影响程序的执行。
从com.qihoo.util.StubApplication可以看到,这里拓词可能是用了奇虎的一些安全框架。
回归正题,现在再看看拓词是怎么挂的
有了应用的coredump、maps、tombstone等信息,我们就可以对这个应用进行全面的分析。
从tombstone可以看到如下信息:
pid: 5299, tid: 5397, name: Thread-333 >>> com.towords <<< signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0000001c r0 753d0628 r1 00000000 r2 42da5e60 r3 00000000 r4 42da5e60 r5 42da5e60 r6 00000000 r7 7598f7d8 r8 7598fb10 r9 7539ff0c sl 00000001 fp 7598fb24 ip 1d300001 sp 7598f748 lr 415479e7 pc 4155ac2e cpsr 600b0030 backtrace: #00 pc 0005fc2e /system/lib/libdvm.so (dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)+9) #01 pc 0004c9e3 /system/lib/libdvm.so #02 pc 0000ebbb <unknown>
用gdb分析core:
(gdb) disassemble Dump of assembler code for function dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list): 0x4155ac24 <+0>: stmdb sp!, {r4, r5, r6, r7, r8, r9, r10, r11, lr} 0x4155ac28 <+4>: mov r10, r3 0x4155ac2a <+6>: sub sp, #28 0x4155ac2c <+8>: movs r3, #0 => 0x4155ac2e <+10>: ldr r5, [r1, #28] 0x4155ac30 <+12>: mov r6, r0
显然r1值为空导致这次crash。r1值是Method*,是上一级函数传下来的。
从sp中查找上一级的返回地址:
0x7598f748: 0x42da5e60 0x415b5bd8 0x00000014 0x415245cc 0x7598f758: 0x42da5e60 0x753d0628 0x415a6c6c 0x42da5e60 r4 0x7598f768: 0x42da5e60 0x00000000 0x7598f7d8 0x7598fb10 r5 r6 r7 r8 0x7598f778: 0x7539ff0c 0x753d0638 0x7598fb24 0x415479e7 r9 r10 r11 lr
从lr的值可以推出上一级的函数地址为0x415479e7附近:
(gdb) disassemble 0x415479e6 Dump of assembler code for function NewObjectV(JNIEnv*, jclass, jmethodID, va_list): 0x415479a0 <+0>: push {r4, r5, r6, r7, lr} 0x415479a2 <+2>: mov r5, r0 0x415479a4 <+4>: sub sp, #28 0x415479a6 <+6>: mov r4, r1 0x415479a8 <+8>: add r0, sp, #12 0x415479aa <+10>: mov r1, r5 0x415479ac <+12>: mov r6, r2 ; jmethodID 0x415479ae <+14>: mov r7, r3 0x415479b0 <+16>: bl 0x41543c88 <ScopedJniThreadState::ScopedJniThreadState(_JNIEnv*)> 0x415479b4 <+20>: mov r1, r4 0x415479b6 <+22>: ldr r0, [sp, #12] 0x415479b8 <+24>: bl 0x41544d00 <dvmDecodeIndirectRef(Thread*, _jobject*)> 0x415479bc <+28>: mov r4, r0 0x415479be <+30>: bl 0x41543974 <canAllocClass(ClassObject*)> 0x415479c2 <+34>: cbz r0, 0x41547a02 <NewObjectV(JNIEnv*, jclass, jmethodID, va_list)+98> 0x415479c4 <+36>: ldr r3, [r4, #44] ; 0x2c 0x415479c6 <+38>: cmp r3, #7 0x415479c8 <+40>: beq.n 0x415479e8 <NewObjectV(JNIEnv*, jclass, jmethodID, va_list)+72> 0x415479ca <+42>: mov r0, r4 0x415479cc <+44>: bl 0x41566010 <dvmInitClass(ClassObject*)> 0x415479d0 <+48>: cbnz r0, 0x415479e8 <NewObjectV(JNIEnv*, jclass, jmethodID, va_list)+72> 0x415479d2 <+50>: b.n 0x41547a02 <NewObjectV(JNIEnv*, jclass, jmethodID, va_list)+98> 0x415479d4 <+52>: add r3, sp, #16 0x415479d6 <+54>: ldr r0, [sp, #12] 0x415479d8 <+56>: mov r1, r6 ; jmethodID 0x415479da <+58>: mov r2, r4 ; Object* 0x415479dc <+60>: stmia.w sp, {r3, r7} 0x415479e0 <+64>: movs r3, #1 0x415479e2 <+66>: bl 0x4155ac24 <dvmCallMethodV(Thread*, Method const*, Object*, bool, JValue*, std::__va_list)> => 0x415479e6 <+70>: b.n 0x41547a04 <NewObjectV(JNIEnv*, jclass, jmethodID, va_list)+100> 0x415479e8 <+72>: mov r0, r4
这里的Method*是MethodID,依然是上一级函数传下来的,继续用sp推导上一级函数
0x7598f788: 0x7598f798 0x7598f7d8 0x00000000 0x753d0628 0x7598f798: 0x4185ceb0 0x415477c5 0x753d0cc8 0x415479a1 r4 0x7598f7a8: 0x753d0cc8 0x754034bc 0x753ff80a 0x753f0bbd r5 r6 r7 lr
lr值是0x753f0bbd,查看附近代码:
0x753f0b84: push {r3} 0x753f0b86: push {r0, r1, r4, r5, r6, r7, lr} 0x753f0b88: ldr r3, [r0, #0] 0x753f0b8a: adds r5, r0, #0 0x753f0b8c: adds r7, r2, #0 0x753f0b8e: ldr r3, [r3, #24] 0x753f0b90: blx r3 0x753f0b92: ldr r6, [pc, #52] 0x753f0b94: adds r1, r0, #0 0x753f0b96: add r6, pc 0x753f0b98: str r0, [r6, #0] 0x753f0b9a: cmp r0, #0 0x753f0b9c: beq.n 0x753f0bbe 0x753f0b9e: ldr r3, [r5, #0] 0x753f0ba0: adds r2, r7, #0 0x753f0ba2: adds r0, r5, #0 0x753f0ba4: adds r3, #8 0x753f0ba6: ldr r4, [r3, #124] ; 0x7c 0x753f0ba8: ldr r3, [sp, #28] 0x753f0baa: blx r4 0x753f0bac: adds r2, r0, #0 0x753f0bae: ldr r0, [r5, #0] 0x753f0bb0: add r3, sp, #32 0x753f0bb2: str r3, [sp, #4] 0x753f0bb4: ldr r4, [r0, #116] ; 0x74 0x753f0bb6: ldr r1, [r6, #0] 0x753f0bb8: adds r0, r5, #0 ==> 0x753f0bba: blx r4 ; NewObjectV(JNIEnv*, jclass, jmethodID, va_list) 0x753f0bbc: str r0, [r6, #4] 0x753f0bbe: pop {r0, r1, r4, r5, r6, r7}
调用NewObjectV时传入的参第三个参数r2就是Method*,
这里的r2是参数MethodID,它是下面函数调用的返回值:
0x753f0baa: blx r4
这个r4值相关代码如下:
0x753f0b86: push {r0, r1, r4, r5, r6, r7, lr} 0x753f0b8a: adds r5, r0, #0 0x753f0b9e: ldr r3, [r5, #0] 0x753f0ba6: ldr r4, [r3, #124] ; 0x7c 0x753f0ba8: ldr r3, [sp, #28] 0x753f0baa: blx r4
其中,栈里的数据如下:
0x7598f7b8: 0x753d0cc8 0x7598f7d8 0x753d0cc8 0x753c86a8 r0 r1 r4 r5 0x7598f7c8: 0x753d0cc8 0x400c6384 0x753f0d35 0x753ff19c r6 r7 lr r3
这样,可以推导出r4值是0x415477c5:
0x753f0b86: push {r0, r1, r4, r5, r6, r7, lr} 0x753f0b8a: adds r5, r0, #0 ; r5 = r0 = 0x753d0cc8 0x753f0b9e: ldr r3, [r5, #0] ; r3 = [r5] = [0x753d0cc8] = 0x415a43ec 0x753f0ba6: ldr r4, [r3, #124] ; r4 = [0x415a43ec+124] = [0x415a4468] = 0x415477c5 0x753f0baa: blx r4
这个函数就是GetMethodID():
(gdb) disassemble 0x415477c5 Dump of assembler code for function GetMethodID(JNIEnv*, jclass, char const*, char const*): 0x415477c4 <+0>: stmdb sp!, {r4, r5, r6, r7, r8, r9, lr} 0x415477c8 <+4>: mov r5, r0 0x415477ca <+6>: sub sp, #20 0x415477cc <+8>: mov r4, r1 ...
MethodID就是通过调用虚拟机的GetMethodID()来获取的,而这个函数却返回了0。
我们再看看它是要获取哪个函数的MethodID,这需要解析它的几个参数。
第一个参数相关代码:
0x753f0b84: push {r3} 0x753f0b86: push {r0, r1, r4, r5, r6, r7, lr} 0x753f0b8a: adds r5, r0, #0 ; r5 = r0 = 0x753d0cc8 0x753f0ba2: adds r0, r5, #0 ; r0 = r5 = 0x753d0cc8 0x753f0baa: blx r4
从GetMethodID(JNIEnv*, jclass, char const*, char const*)的定义可知,第一个参数
r0 = 0x753d0cc8是JNIEnv*
第二个参数相关代码:
0x753f0b90: blx r3 0x753f0b92: ldr r6, [pc, #52] ; r6 = [0x753f0bc8] = 0x00012922 0x753f0b94: adds r1, r0, #0 ; r1 = r0 0x753f0b96: add r6, pc ; r6 += 0x753f0b96 + 2 = 0x754034bc 0x753f0b98: str r0, [r6, #0] ; r0 = [0x754034bc] = 0x4185ceb0 0x753f0baa: blx r4 ; GetMethodID(JNIEnv*, jclass, char const*, char const*)
r1值等于blx r3的返回值r0,而这个r0是保存在r6指向的内存里,这样r1的值就是0x4185ceb0。
根据GetMethodID(JNIEnv*, jclass, char const*, char const*)的定义可知,第二个参数是ClassObject*
(gdb) p *(ClassObject*)0x4185ceb0 $14 = { <Object> = { clazz = 0x416cc1e8, lock = 0 }, members of ClassObject: instanceData = {0, 0, 0, 0}, descriptor = 0x6f21a8b9 <Address 0x6f21a8b9 out of bounds>, ...
通过map表,可以知道这个descriptor是/data/dalvik-cache/s[email protected]@[email protected]:
6ec5c000-6edd4000 r--p 00000000 b3:1b 40972 /data/dalvik-cache/[email protected]@[email protected] ... 6f14b000-6f14c000 r--p 004ef000 b3:1b 40972 /data/dalvik-cache/[email protected]@[email protected] 6f14c000-6f5b0000 r--p 004f0000 b3:1b 40972 /data/dalvik-cache/[email protected]@[email protected] 6f5b0000-6f669000 rw-p 00000000 00:04 9331 /dev/ashmem/dalvik-aux-structure (deleted)
计算相对偏移
(gdb) p /x 0x6f21a8b9-0x6ec5c000 $15 = 0x5be8b9
从手机中导出/data/dalvik-cache/[email protected]@[email protected],用二进制编辑器查看:
@/data/dalvik-cache/[email protected]@[email protected] 0x5be8b9: 24 4C 61 6E 64 72 6F 69 64 2F 74 65 6C 65 70 68 6F 6E 79 2F 54 65 6C 65 70 68 6F 6E 79 4D 61 6E 61 67 65 72 3B 00 $Landroid/telephony/TelephonyManager;
确定这个Object所属类是android/telephony/TelephonyManager。
这里的blx r3通过推导也很容易知道是调用FindClass(),
也就是说这里通过FindClass()找到了android/telephony/TelephonyManager类。
第三个参数相关代码:
0x753f0ba0: adds r2, r7, #0
这里r7的值直接取下一级函数NewObjectV()对应的栈里面取就是了。
0x7598f788: 0x7598f798 0x7598f7d8 0x00000000 0x753d0628 Thead* 0x7598f798: 0x4185ceb0 0x415477c5 0x753d0cc8 0x415479a1 r4 0x7598f7a8: 0x753d0cc8 0x754034bc 0x753ff80a 0x753f0bbd r5 r6 r7 lr
r2 = r7 = 0x753ff80a
根据GetMethodID(JNIEnv*, jclass, char const*, char const*)定义可知它是一个字符串:
(gdb) x /s 0x753ff80a <init>
第三个参数是字符串"<init>"。
第四个参数相关代码:
0x753f0b84: push {r3} ; [0x753ff19c] = r3, sp = 0x7598f7d4 0x753f0b86: push {r0, r1, r4, r5, r6, r7, lr} ; sp -= 28 = 0x7598f7b8 0x753f0ba8: ldr r3, [sp, #28] ; r3 = [sp-28] = [0x7598f7d4] 0x753f0baa: blx r4 ; GetMethodID(JNIEnv*, jclass, char const*, char const*)
r3 就是第一句话中压入栈里的 0x753ff19c
0x7598f7b8: 0x753d0cc8 0x7598f7d8 0x753d0cc8 0x753c86a8 r0 r1 r4 r5 0x7598f7c8: 0x753d0cc8 0x400c6384 0x753f0d35 0x753ff19c r6 r7 lr r3
根据GetMethodID(JNIEnv*, jclass, char const*, char const*)定义可知它也是一个字符串:
(gdb) x /s 0x753ff19c ()V
至此,这里大概的逻辑是这样的:
jclass localClass = env->FindClass("android/telephony/TelephonyManager"); jmethodID localMethodID = env->GetMethodID(localClass,"<init>","()V") jobject localObject = env->NewObject(localClass,localMethodID,NULL)
也就是在调用android/telephony/TelephonyManager的默认构造函数的时候死掉的。
查找代码发现frameworks/telephony/base/java/android/telephony/TelephonyManager.java中确实没有默认构造函数。
而原生代码中是有默认构造函数的。
查看代码提交记录,发现是有位同事发现没有地方调用这个默认构造函数,所以给去掉了。
【解决方案】
添加默认构造函数后,APP不再crash了。