Name mangling && Name demangling
在讲述golang如何利用swig调用windows dll之前,需要了解一个概念:Name Mangling (或者叫Decorated Name).在百度翻译中输入Name Mangling,翻译成中文是“名字改编”,或者“名称重整”。Decorated Name,是微软的叫法,被百度翻译为“修饰名”。不管是英文名,还是翻译后的结果看,Name Mangling的叫法是比较合适的,并且是C++领域内的普遍叫法。通常情况下,C++程序员不必了解Name Mangling,但是在工程内生成或者调用动态链接库,可能需要对其有所了解。先看看下面的例子。
打开VS2010,File--New-Project,选择Visual C++--Win32,右侧选中Win32 Console Application,创建名称Simple的工程
点击OK,然后next,
选中DLL,勾选Export symbols,点击Finish完成创建。
下面是自动生成的Simple.h文件
// The following ifdef block is the standard way of creating macros which make exporting
// from a DLL simpler. All files within this DLL are compiled with the SIMPLE_EXPORTS
// symbol defined on the command line. This symbol should not be defined on any project
// that uses this DLL. This way any other project whose source files include this file see
// SIMPLE_API functions as being imported from a DLL, whereas this DLL sees symbols
// defined with this macro as being exported.
#ifdef SIMPLE_EXPORTS
#define SIMPLE_API __declspec(dllexport)
#else
#define SIMPLE_API __declspec(dllimport)
#endif
// This class is exported from the Simple.dll
class SIMPLE_API CSimple {
public:
CSimple(void);
// TODO: add your methods here.
};
extern SIMPLE_API int nSimple;
SIMPLE_API int fnSimple(void);
自动生成的代码导出了一个类,一个函数,一个变量。给CSimple类添加一个简单的方法,SayHello
#ifdef SIMPLE_EXPORTS
#define SIMPLE_API __declspec(dllexport)
#else
#define SIMPLE_API __declspec(dllimport)
#endif
// This class is exported from the Simple.dll
class SIMPLE_API CSimple {
public:
CSimple(void);
// TODO: add your methods here.
int SayHello();
};
extern SIMPLE_API int nSimple;
SIMPLE_API int fnSimple(void);
Cpp文件:
// This is an example of an exported variable
SIMPLE_API int nSimple=0;
// This is an example of an exported function.
SIMPLE_API int fnSimple(void)
{
return 42;
}
// This is the constructor of a class that has been exported.
// see Simple.h for the class definition
CSimple::CSimple()
{
return;
}
int CSimple::SayHello()
{
return 0;
}
在Configuration Manager中添加x64
编译后,在D:\Sample\Simple\x64\Debug生成的主要文件有Simple.dll,Simple.lib,Simple.pdb。
通常情况下,C++调用Dll只需要在工程链接属性中设置附加库路径和附加依赖引入Simple.lib就可以了,动态链接库的生成和调用就到此为止了。
调用LoadLibrary 和GetProcAddress接口时,或者解决运行时缺少动态链接库问题时,往往会用到windows开发者熟知的著名工具Dependency Walker,这个工具可以查看依赖和导入导出函数,用x64版打开上面生成的64位dll,
在导出栏里我们看到Simple.dll导出了5个函数(变量),函数名都是以?开头,从代码里看,只导出了4项:CSimple类的两个成员函数CSimple::CSimple、CSimple::SayHello,fnSimple和nSimple变量,dependency Walker中的导出多了1项,从函数名上可以大概推断下,第五项是nSimple变量,第四项是fnSimple函数,第三项是CSimple::SayHello函数,其他两个不太容易看,其中一个应该是CSimple::CSimple。
在VS的安装目录下(C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64)有一些小工具,其中一个叫undname.exe,这个工具的全名应该叫Undecorated Name,与Decorated是反义,这个工具可以将导出函数名解析成程序员易读的名字。依次对1-5调用undname
Exported function name |
undname |
[email protected]@[email protected] |
"public: __cdecl CSimple::CSimple(void) __ptr64" |
[email protected]@[email protected]@@Z |
"public: class CSimple & __ptr64 __cdecl CSimple::operator=(class CSimple const & __ptr64) __ptr64" |
[email protected]@@QEAAHXZ |
"public: int __cdecl CSimple::SayHello(void) __ptr64" |
[email protected]@YAHXZ |
"int __cdecl fnSimple(void)" |
[email protected]@3HA |
"int nSimple" |
CSimple::CSimple()成员函数名经过编译器处理后变成了[email protected]@[email protected],CSimple::SayHello成员函数经过编译器处理后变成了[email protected]@@QEAAHXZ,通俗地讲这个处理过程就是Name Mangling.
引用一外国哥们博文中的三段英文,虽然非官方解释,但是个人认为他把Name Mangling很清楚地讲明白了。如果希望有更严谨的解释,自行查看维基百科。
here is a quick reminder on what is name mangling and why it is used in C++. When an executable is started, some (many) of its symbols addresses are resolved during the startup - at runtime - because these addresses are not known during the static link (i.e. when the executable is generated). One way to resolve them is to search them by symbol name. This is typically what is done when "dlsym" is used, but it is also how the runtime linker resolves the address of the functions implemented in the dynamic libraries.
When developing in C, all public symbols of an executable are unique. It is not possible to have two different variables or functions with the same name. The only exceptions to this are local variables and static variables/functions. However such symbols are private to a function or a file and cannot be retrieved by name at runtime. So in C the symbols signature is the name that appears in the code. In this case the linker can find a function by using its name as the signature.
Things are not that simple in C++. The symbols names cannot just be the name of the functions or methods because the language allows polymorphism: A method can have several prototypes to operate on different kinds of data. for example it is possible to declare "void Foo(int)" and "void Foo(double)". In this case it is not possible to use the function name as a signature without a conflict. This is why name mangling is used: Some additional information is added to the symbol name to make it unique in the executable/library. One can see C++ mangling as a concatenation of the method name with its parameters types. This is a simplification because other things are used to generate a mangled name, but it is the general idea. Once names are mangled, it is possible to resolve symbols at runtime by name without conflict.
简单的翻译就是:C++允许函数重载,需要name mangling来保证名称唯一。C++标准中没有明确name mangling的规则,每家编译器的实现不太一样,甚至同一编译器的不同版本,同一编译器在不同平台上的实现,Name Mangling也可能不一样。(参考wiki Name Mangling)。
比较常见的两个C++ 编译器分别是GNU gcc(g++)编译器和微软Visual C++编译器,这两个编译器Name Mangling的结果完全不一样。GNU编译器是开源的,所以其Name Mangling的规则及代码都可以在网上搜索到。Github.com上的gcc-mirror/gcc,gcc/gcc/cp/mangle.c是gcc编译器Name Mangling部分的源码。
Visual C++是商业闭源软件,他的Name Mangling规则是非公开的,微软的术语叫decorated Name,官方资料少得可怜,可以在MSDN上查看相关内容。
对Name Mangling有了个初步的了解后,我们继续看例子。将上面VS2010自动生成的代码用g++编译成dll文件。
编译前需要在windows os中安装TDM-GCC,这是gcc编译器的一个windows版实现,官方下载地址(http://tdm-gcc.tdragon.net/download)。
在D:\Sample\文件夹下创建子文件夹GCC,将Simple.h和Simple.cpp文件拷贝到D:\Sample\gcc下,将Simple.cpp文件中的#include "stdafx.h"注释掉。打开命令行,切换到D:\Sample\gcc,执行g++ -shared -Wall -o Simple.dll Simple.cpp -DSIMPLE_EXPORTS,生成了Simple.dll,再次用Dependency Walker打开刚刚生成的Simple.dll
乍一看,gcc编译的dll也导出了5个函数(变量),实则不然,第三和第四的Entry Point相同,同一函数有两个名称而已。GNU C++编译器也有个工具,名字叫c++filt,他的功能和微软的undname.exe是一样的,对mangled name进行demangling。依次对1-5调用c++filt
Mangled names |
c++filt |
_Z8fnSimplev |
fnSimple() |
_ZN7CSimple8SayHelloEv |
CSimple::SayHello() |
_ZN7CSimpleC1Ev |
CSimple::CSimple() |
_ZN7CSimpleC2Ev |
CSimple::CSimple() |
nSimple |
nSimple |
通过上面的两个例子可以看出,Visual C++编译器与GNU编译器Name Mangling的结果是完全不一样的,并且demangle工具给出的信息量也是不一样的,c++filt给出的结果只有名称,可区分函数和变量。
在demangling过程中,微软和GNU编译器都提供了demangling工具;但是如果想要借助一个工具从一个Name得到Mangled Name,好像没有单独的工具,根据我百度加谷歌的结果得出的答案。不过可以借助编译器预处理来间接得到结果,具体在以后的随笔中给出。
从上面的第一个例子可以看到,vs2010 C++编译器为CSimple类自动生成了拷贝赋值操作符=,看过Effective C++第三版的会知道,条款5《了解C++默认编写并调用哪些函数》讲到:“什么时候empty class(空类)不再是一个empty class?当C++处理过它之后。是的,如果你自己没声明,编译器就会为它声明(编译器版本的)一个copy构造函数,一个copy assignment操作符和一个析构函数。此外如果你没有声明任何构造函数,编译器也会为你声明一个default构造函数。所有这些都是public切inline。”后面又讲到,“唯有当这些函数被需要(被调用),他们才会被编译器创建出来。”
在生成动态链接库时,微软的编译器似乎并没有遵守条款五,因为生成dll文件时,编译器无法知道dll的调用者是否会调用拷贝赋值操作符;GUN编译器遵守了这个条款,在导出的类成员函数中只有构造函数CSimple::CSimple()和CSimple::SayHello()。 或者说,条款五描述的是GNU C++编译器的行为。