许久以后,这个续上这个系列的第三篇。
玩过mono的可能知道mono有一个工具mkbundle ,可以把mono的运行时与类库与你的程序的依赖程序集都打包成一个可执行文件,在win下为exe文件,例如mandroid.exe,mtouch.exe,在mac下的Mach-O文件,例如mtouch,mtouch-64。
根据他的源代码 https://github.com/mono/mono/tree/master/mcs/tools/mkbundle,我们得到:
template_main.c
int main (int argc, char* argv[]) { char **newargs; int i, k = 0; newargs = (char **) malloc (sizeof (char *) * (argc + 2 + count_mono_options_args ())); newargs [k++] = argv [0]; if (mono_options != NULL) { i = 0; while (mono_options[i] != NULL) newargs[k++] = mono_options[i++]; } newargs [k++] = image_name; for (i = 1; i < argc; i++) { newargs [k++] = argv [i]; } newargs [k] = NULL; if (config_dir != NULL && getenv ("MONO_CFG_DIR") == NULL) mono_set_dirs (getenv ("MONO_PATH"), config_dir); mono_mkbundle_init(); return mono_main (k, newargs); }
看调用了函数mono_mkbundle_init,而这个函数有两个实现,分别位于:
https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template.c
和
https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template_z.c
工具根据运行选项 -z 是否压缩程序集选择使用template.c或template_z.c内的mono_mkbundle_init函数实现,我们使用时通常是选择压缩的,所以通常使用的是后者的实现。
看 https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template_z.c,:
void mono_mkbundle_init () { CompressedAssembly **ptr; MonoBundledAssembly **bundled_ptr; Bytef *buffer; int nbundles; install_dll_config_files (); ptr = (CompressedAssembly **) compressed; nbundles = 0; while (*ptr++ != NULL) nbundles++; bundled = (MonoBundledAssembly **) malloc (sizeof (MonoBundledAssembly *) * (nbundles + 1)); bundled_ptr = bundled; ptr = (CompressedAssembly **) compressed; while (*ptr != NULL) { uLong real_size; uLongf zsize; int result; MonoBundledAssembly *current; real_size = (*ptr)->assembly.size; zsize = (*ptr)->compressed_size; buffer = (Bytef *) malloc (real_size); result = my_inflate ((*ptr)->assembly.data, zsize, buffer, real_size); if (result != 0) { fprintf (stderr, "mkbundle: Error %d decompressing data for %s\n", result, (*ptr)->assembly.name); exit (1); } (*ptr)->assembly.data = buffer; current = (MonoBundledAssembly *) malloc (sizeof (MonoBundledAssembly)); memcpy (current, *ptr, sizeof (MonoBundledAssembly)); current->name = (*ptr)->assembly.name; *bundled_ptr = current; bundled_ptr++; ptr++; } *bundled_ptr = NULL; mono_register_bundled_assemblies((const MonoBundledAssembly **) bundled); }
我们看到解压时使用了compressed这个本文件未定义的变量。通过工具源代码我们得知其是一个类型为如下结构体指针的数组:
typedef struct { const char *name; const unsigned char *data; const unsigned int size; } MonoBundledAssembly; typedef struct _compressed_data { MonoBundledAssembly assembly; int compressed_size; } CompressedAssembly;
也就是说我们找到被打包后的程序的函数mono_mkbundle_init ,并找到对compressed这个数据的引用操作,就可以找到一个程序集个数的int32(64位打包target为int64)数组,每个数组为一个指向CompressedAssembly结构体的指针。(不好描述,继续看我给的代码吧~)
因为compressed指向的是常量数据,一般位于执行文件的类似名为.data或.const等段。
因为被打包后的程序如 mandroid.exe 往往无任何符号,定位mono_mkbundle_init 以及 compressed并不容易,往往需要靠人工判断,这个想自动化完成。通过对各个版本的Xa*****程序集分析得到结果是,再无c语言级别上的代码大改动的情况下,同一语句生成的汇编的对数据引用的偏移量可能会变更,但如果不看数据引用的话,汇编语句的语义序列以及顺序往往固定,也就是说我们可以根据此特征定位位于函数mono_mkbundle_init 内对compressed变量引用时compressed变量在可执行文件的虚拟地址(VA)。
下面我们就得请出伟大的泄漏版IDA Pro 6.5 (没有的自己百度吧~pediy的资源区有)。
我们得知函数内有常量 [mkbundle: Error %d decompressing data for %s\n]这个字符串(根据win或mac的编译器不同,前面的mkbundle: 有时会没有),而往往整个程序只有一个函数对此有引用,由此我们得到mono_mkbundle_init 函数,这个通过IDAPython脚本可以得到,然后找到函数内第一次对数据段的引用这个引用的就是compressed变量,上代码:
#!/usr/bin/env python # coding=gbk # 支持 mtouch mtouch-64 mtouch.exe mandroid.exe 解包 # 用IDA打开待分析文件,等待分析完毕,运行此脚本,将会在待分析文件同目录下生成临时文件夹并解压文件# by BinSys import urllib2, httplib import zlib import StringIO, gzip import struct import io import sys import idaapi import idc import idautils from struct import * import time import datetime from datetime import datetime, date, time InputFileType_EXE = 11 InputFileType_MachO = 25 InputFileType = -1 Is64Bit = False string_type_map = { 0 : "ASCSTR_C", # C-string, zero terminated 1 : "ASCSTR_PASCAL", # Pascal-style ASCII string (length byte) 2 : "ASCSTR_LEN2", # Pascal-style, length is 2 bytes 3 : "ASCSTR_UNICODE", # Unicode string 4 : "ASCSTR_LEN4", # Delphi string, length is 4 bytes 5 : "ASCSTR_ULEN2", # Pascal-style Unicode, length is 2 bytes 6 : "ASCSTR_ULEN4", # Pascal-style Unicode, length is 4 bytes } filetype_t_map = { 0 : "f_EXE_old", # MS DOS EXE File 1 : "f_COM_old", # MS DOS COM File 2 : "f_BIN", # Binary File 3 : "f_DRV", # MS DOS Driver 4 : "f_WIN", # New Executable (NE) 5 : "f_HEX", # Intel Hex Object File 6 : "f_MEX", # MOS Technology Hex Object File 7 : "f_LX", # Linear Executable (LX) 8 : "f_LE", # Linear Executable (LE) 9 : "f_NLM", # Netware Loadable Module (NLM) 10 : "f_COFF", # Common Object File Format (COFF) 11 : "f_PE", # Portable Executable (PE) 12 : "f_OMF", # Object Module Format 13 : "f_SREC", # R-records 14 : "f_ZIP", # ZIP file (this file is never loaded to IDA database) 15 : "f_OMFLIB", # Library of OMF Modules 16 : "f_AR", # ar library 17 : "f_LOADER", # file is loaded using LOADER DLL 18 : "f_ELF", # Executable and Linkable Format (ELF) 19 : "f_W32RUN", # Watcom DOS32 Extender (W32RUN) 20 : "f_AOUT", # Linux a.out (AOUT) 21 : "f_PRC", # PalmPilot program file 22 : "f_EXE", # MS DOS EXE File 23 : "f_COM", # MS DOS COM File 24 : "f_AIXAR", # AIX ar library 25 : "f_MACHO", # Max OS X } def FindStringEA(): searchstr = str("mkbundle: Error %d decompressing data for %s\n") searchstr2 = str("Error %d decompresing data for %s\n") #Do not use default set up, we‘ll call setup(). s = idautils.Strings(default_setup = False) # we want C & Unicode strings, and *only* existing strings. s.setup(strtypes=Strings.STR_C | Strings.STR_UNICODE, ignore_instructions = True, display_only_existing_strings = True) #loop through strings for i, v in enumerate(s): if not v: #print("Failed to retrieve string at index {}".format(i)) return -1 else: #print("[{}] ea: {:#x} ; length: {}; type: {}; ‘{}‘".format(i, v.ea, v.length, string_type_map.get(v.type, None), str(v))) if str(v) == searchstr or str(v) == searchstr2: return v.ea return -1 def FindUnFunction(StringEA): for ref in DataRefsTo(StringEA): f = idaapi.get_func(ref) if f: return f return None def FindDataOffset(FuncEA): for funcitem in FuncItems(FuncEA): #print hex(funcitem) for dataref in DataRefsFrom(funcitem): return dataref #print " " + hex(dataref) return None def GetStructOffsetList(DataOffset): global Is64Bit if Is64Bit == True: addv = 8 mf=MakeQword vf=Qword else: mf=MakeDword addv = 4 vf=Dword AsmListStructListOffset = DataOffset currentoffset = AsmListStructListOffset mf(currentoffset) currentvalue = vf(currentoffset) currentoffset+=addv AsmListStructListOffsetList = [] AsmListStructListOffsetList.append(currentvalue) while currentvalue!= 0: mf(currentoffset) currentvalue = vf(currentoffset) if currentvalue!=0: AsmListStructListOffsetList.append(currentvalue) currentoffset+=addv return AsmListStructListOffsetList #print len(AsmListStructListOffsetList) #for vv in AsmListStructListOffsetList: #print hex(vv) def MakeFileItemStruct(FileItemStructOffset): global Is64Bit if Is64Bit == True: addv = 8 mf=MakeQword vf=Qword else: mf=MakeDword addv = 4 vf=Dword offset = FileItemStructOffset mf(offset) FileNameOffset = vf(offset) FileName = idc.GetString(FileNameOffset) offset+=addv mf(offset) FileDataOffset = vf(offset) offset+=addv mf(offset) FileSize = vf(offset) FileSizeOffset = offset offset+=addv mf(offset) FileCompressedSize = vf(offset) FileCompressedSizeOffset = offset offset+=addv IsGZip = 0 FileDataCompressed = idc.GetManyBytes(FileDataOffset,FileCompressedSize) b1,b2,b3 = struct.unpack(‘ccc‘, FileDataCompressed[0:3]) if b1 == ‘\x1f‘ and b2 == ‘\x8b‘ and b3 == ‘\x08‘: IsGZip = 1 else: IsGZip = 0 return { "FileItemStructOffset":FileItemStructOffset, "FileNameOffset":FileNameOffset, "FileName":FileName, "FileDataOffset":FileDataOffset, "FileSize":FileSize, "FileSizeOffset":FileSizeOffset, "FileCompressedSizeOffset":FileCompressedSizeOffset, "FileCompressedSize":FileCompressedSize, "IsGZip":IsGZip, "FileDataCompressed":FileDataCompressed } #Python语言: Python Cookbook: 比系统自带的更加友好的makedir函数 #from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/82465 def _mkdir(newdir): """works the way a good mkdir should :) - already exists, silently complete - regular file in the way, raise an exception - parent directory(ies) does not exist, make them as well """ if os.path.isdir(newdir): pass elif os.path.isfile(newdir): raise OSError("a file with the same name as the desired " "dir, ‘%s‘, already exists." % newdir) else: head, tail = os.path.split(newdir) if head and not os.path.isdir(head): _mkdir(head) #print "_mkdir %s" % repr(newdir) if tail: os.mkdir(newdir) def DecompressZLib(Data,Path): #compressedstream = StringIO.StringIO(Data) data2 = zlib.decompress(Data) f = open(Path, ‘wb‘) f.write(data2) f.close() pass def DecompressGzipTo(Data,Path): compressedstream = StringIO.StringIO(Data) gziper = gzip.GzipFile(fileobj=compressedstream) data2 = gziper.read() # 读取解压缩后数据 f = open(Path, ‘wb‘) f.write(data2) f.close() pass def DecompressFileTo(FileItem,OutputDir): newpath = ‘{}\\{}‘.format(OutputDir, FileItem["FileName"]) #print newpath if FileItem["IsGZip"] == 1: DecompressGzipTo(FileItem["FileDataCompressed"],newpath) pass else: DecompressZLib(FileItem["FileDataCompressed"],newpath) pass pass def main(): global Is64Bit global InputFileType print("Input File:{}".format(GetInputFile())) print("Input File Path:{}".format(GetInputFilePath())) print("Idb File Path:{}".format(GetIdbPath())) print("cpu_name:{}".format(idc.GetShortPrm(idc.INF_PROCNAME).lower())) InputFileType = idc.GetShortPrm(idc.INF_FILETYPE) #ida.hpp filetype_t f_PE=11 f_MACHO=25 print("InputFileType:{}".format(filetype_t_map.get(InputFileType, None))) if InputFileType != InputFileType_EXE and InputFileType != InputFileType_MachO: print "Error,Input file type must is PE or MachO!" return if (idc.GetShortPrm(idc.INF_LFLAGS) & idc.LFLG_64BIT) == idc.LFLG_64BIT: Is64Bit = True else: Is64Bit = False print("Is64Bit:{}".format(Is64Bit)) OutputDir = ‘{}_{:%Y%m%d%H%M%S%f}‘.format(GetInputFilePath(), datetime.now()) _mkdir(OutputDir) print("OutputDir:{}".format(OutputDir)) StringEA = FindStringEA() if StringEA == -1: print "Can‘t find StringEA!" return Func = FindUnFunction(StringEA) if not Func: print "Can‘t find Func!" return FuncName = idc.GetFunctionName(Func.startEA) print "Found Data Function:" + FuncName DataOffset = FindDataOffset(Func.startEA) if not DataOffset: print "Can‘t find DataOffset!" return print("DataOffset:0x{:016X}".format(DataOffset)); StructOffsetList = GetStructOffsetList(DataOffset) if len(StructOffsetList) == 0: print "Can‘t find StructOffsetList!" return FileItems = [] for StructOffsetItem in StructOffsetList: FileItemStruct = MakeFileItemStruct(StructOffsetItem) FileItems.append(FileItemStruct) for FileItem in FileItems: print("FileItemStructOffset:{:016X} FileNameOffset:{:016X} FileDataOffset:{:016X} FileSize:{:016X} FileCompressedSize:{:016X} IsGZip:{} FileName:{}" .format( FileItem["FileItemStructOffset"] , FileItem["FileNameOffset"], FileItem["FileDataOffset"], FileItem["FileSize"], FileItem["FileCompressedSize"], FileItem["IsGZip"], FileItem["FileName"])) DecompressFileTo(FileItem,OutputDir) if __name__ == "__main__": main()
被压缩的数据有两种格式,新版和旧版不一样,根据数据的头部几个字节可以判断压缩格式。