C# to IL 2 IL Basics(IL基础)

This chapter and the next couple of them will focus on and elicit a simple belief of ours,
that if you really want to understand C# code in earnest, then the best way of doing so is
by understanding the IL code generated by the C# compiler.
So, we shall raise the curtains with a small C# program and then explain the IL code
generated by the compiler. In doing so, we will be able to kill two birds with one stone:
Firstly, we will be able to unravel(解开) the mysteries of IL and secondly, we will obtain a more
intuitive understanding of the C# programming language.
We will first show you a .cs file and then a program written in IL by the C# compiler, whose
output will be the same as that of the .cs file. The output will be displayed of the IL code.
This will enhance our understanding of not only C# but also IL. So, without much ado, lets
take the plunge.

The above code is generated by the il disassembler
After executing ildasm on the exe file, we studied the IL code generated by the program.
Subsequently, we eliminated parts of the code that did not ameliorate our understanding of
IL. This consisted of some comments, directives, functions etc. The remaining IL code
presented is as close to the original as possible.

The advantage of this technique of mastering IL by studying the IL code itself is that, we
are learning from the master, i.e. the C# compiler, on how to write decent IL code. We
cannot find a better authority than the C# compiler to enlighten us about IL.
The rules for creating a static function abc remain the same as any other function such as
Main or vijay. As abc is a static function, we have to use the static modifier in the .method

directive.
When we want to call a function, the following information has to be provided in the order
given below:
? the return data type.
? the class name.
? the function name to be called.
? the data types of the parameters.
The same rules also apply when we call the .ctor function from the base class. It is
mandatory to write the name of the class before the name of the function. In IL, no
assumptions are made about the name of the class. The name defaults to the class we are
in while calling the function.
Thus, the above program first displays "hi" using the WriteLine function and then calls the
static function abc. This function too uses the WriteLine function to display "bye".

Static constructors are always called before any other code is executed. In C#, a static
constructor is merely a function with the same name as a class. In IL, the name of the
function changes to .cctor. Thus, you may have observed that in the earlier example, we
got a free function called ctor.
Whenever we have a class with no constructors, a free constructor with no parameters is
created. This free constructor is given the name .ctor. This knowledge should enhance our
ability as C# programmers, as we are now in a better position to comprehend as to what
goes on below the hood.
The static function gets called first and the function with the entrypoint directive gets
called thereafter.

The keyword new in C# gets converted to the assembler instruction newobj. This provides
evidence that IL is not a low level assembler, and that it can also create objects in memory.
The instruction newobj creates a new object in memory. Even in IL, we are shielded from
what new or newobj really does. This demonstrates that IL is not just another high level
language, but is designed in such a way that other modern languages can be compiled to
it.
The rules for using newobj are the same as that for calling a function. The full prototype of
the function name is required. In this case, we are calling the constructor without any
parameters, hence the function .ctor is called. In the constructor, the WriteLine function is
called.
As we had promised earlier, we are going to explain the instruction ldarg.0 here. Whenever
we create an object that is an instance of a class, it contains two basic entities:
? functions
? fields or variables i.e. data.
When a function gets called, it does not know or care as to where it is being called from or
who is calling it. It receives all its parameters off the stack. There is no point in having two
copies of a function in memory. This is because, if a class contains a megabyte of code,
each time we say ‘new‘ on it, an additional megabyte of memory will be occupied.
When new is called for the first time, memory gets allocated for the code and the variables.
But thereafter, with every call on new, fresh memory is allocated only for the variables.
Thus, if we have five instances of a class, there will be only one copy of the code, but five
separate copies of the variables.
Every non-static or instance function is passed a handle which indicates the location of the
variables of the object that has called this function. This handle is called the this pointer.
‘this‘ is represented by ldarg.0. This handle is always passed as the first parameter to every
instance function. Since it is always passed by default, it is not mentioned in the

parameter list of a function.
All the action takes place on the stack. The instruction pop removes whatever is on the top
of the stack. In this example, we use it to remove the instance of zzz that has been placed
on top of the stack by the newobj instruction.

The static constructor always gets called first whereas the instance constructor gets called
only after new. IL enforces this sequence of execution. The calling of the base class
constructor is not mandatory. Hence, to save space in our book, we have not shown its
code in all the programs.
In some cases, if we do not include the code of a constructor, the programs do not work.
Only in these cases, the code of the constructor has been included. The static constructor
does not call the base class constructor, also ‘this’ is not passed to static functions.

We have created two variables called i and j in our function Main in the C# program. They
are local variables and are created on the stack. On conversion to IL, if you notice, the
names of the variables are lost

The variables get created in IL through the locals directive, which assigns its own names to
the variables, beginning with V_0 and V_1 and so on. The data types are also altered from
int to int32 and from long to int64. The basic types in C# are aliases. They all get converted
to data types that IL understands.
The task on hand is to initialize the variable i to a value of 6. This value has to be loaded
on the stack or evaluation stack. The instruction to do so is ldc.i4.value. An i4 takes up
four bytes of memory.
The value mentioned in the syntax above is the constant that has to be put on the stack.
After the value 6 has been loaded on to the stack, we now need to initialize the variable i to
this value. The variable i has been renamed as V_0 and is the first variable in the locals
directive.
The instruction stloc.0 takes the value present at the top of the stack i.e. 6 and initializes
the variable V_0 to it. The process of initializing a variable is definitely complicated.
The second ldc instruction copies the value of 7 onto the stack. On a 32 bit machine,
memory can only be allocated in chunks of 32 bytes. In the same vein, on a 64 bit
machine, the memory is allocated in chunks of 64 bytes.
The number 7 is stored as a constant and requires only 4 bytes, but a long requires 8
bytes. Thus, we need to convert the 4 bytes to 8 bytes. The instruction conv.i8 is used for
this purpose. It places a 8 byte number on the stack. Only after doing so, we use stloc.1 to
initialize the second variable V_1 to the value of 7. Hence stloc.1
Thus, the ldc series is used to place a constant number on the stack and stloc is utilized to
pick up what is on the stack and initialize a local to that value.

 

Now you will finally be able to see the light at the end of the tunnel and understand as to
why we wanted you to read this book in the first place.
Let us understand the above code, one field at a time. We have created a variable i that is
static and initialized it to the value of 6. Since the variable i has not been given an access
modifier, the default value is private. The static modifier of C# is applicable to variables in
IL also.
The real action begins now. The variable needs to be assigned an initial value. This value
must be assigned in the static constructor only, because the variable is static. We employ
ldc to place the value 6 on the stack. Note that the locals directive is not used here.
To initialize i, we use the instruction stsfld that looks for a value on top of the stack. The
next parameter to the instruction stsfld is the number of bytes it has to pick up from the
stack to initialize the static variable. In this case, the number of bytes specified is 4.
The variable name is preceded by the name of the class. This is in contrast to the syntax of
local variables.
For the instance variable j, since its access modifier was public in C#, on conversion to IL,
its access modifier is retained as public. Since it is an instance variable, its value gets
initialized in the instance constructor. The instruction used here is stfld and not stsfld.
Here we need 8 bytes of the stack.
The rest of the code remains the same as before. Thus, we can see that the instruction
stloc is used to initialize locals and the instruction stfld is used to initialise fields

The main purpose of the above example is to verify whether the variable is initialized first
or the code contained in a constructor gets called first. The IL output demonstrates very
lucidly that, first all the variables get initialized and thereafter, the code in a constructor
gets executed.
You may have also noticed that the base class constructor gets executed first and then,
and only then, does the code that is written in a constructor, get called.
This nugget of knowledge is sure to enhance your understanding of C# and IL

We can print a number instead of a string by overloading the WriteLine function

First, we push the value 10 onto the stack using the ldc family. Observe carefully, the
instruction now is ldc.i4.s and then the value of 10. Any instruction takes 4 bytes in
memory, but when followed by .s takes only one byte.
Then the C# compiler calls the correct overloaded version of the WriteLine function, which
accepts an int32 value from the stack.
This is similar to printing strings

We shall now delve on how to print a number on the screen.
The WriteLine function accepts a string followed by a variable number of objects. The {0}
prints the first object after the comma. Even though there is no variable in the C# code, on
conversion to IL code, a variable of type int32 is created.
The string {0} is loaded on the stack using our trustworthy ldstr. Then, we place the

number that is to be passed as a parameter to the WriteLine function, on the stack. To do
so, we use ldc.i4.s which loads the constant value on the stack. After this, we initialize the
variable V_0 to 20 with the stloc.0 instruction. and then ldloca.s loads the address of the
local varable on the stack.
The major roadblock that we experience here is that the WriteLine function accepts a string
followed by an object as the next parameter. In this case, the variable is of value type and
not reference type.
An int32 is a value type variable whereas the WriteLine function wants a full-fledged object
of a reference type.
How do we solve the dilemma of converting a value type into a reference type?
As informed earlier, we use the instruction ldloca.s to load the address of the local variable
V_0 onto the stack. Thus, our stack contains a string followed by the address of a value
type variable, V_0.
Next, we call an instruction called box. There are only two types of variables in the .NET
world i.e. value types and reference types. Boxing is the method that .NET uses to convert
a value type variable into a reference type variable.
The box instruction takes an unboxed or value type variable and converts it into a boxed or
reference type variable. The box instruction needs the address of a value type on the stack
and allocates space on the heap for its equivalent reference type.
The heap is an area of memory used to store reference types. The values on the stack
disappear at the end of a function, but the heap is available for a much longer duration.
Once this space is allocated, the box instruction initializes the instance fields of the
reference object. Then, it assigns the memory location in the heap, of this newly
constructed object to the stack, The box instruction requires a memory location of a locals
variable on the stack.
The constant stored on the stack has no physical address. Thus, the variable V_0 is
created to provide the memory location.
This boxed version on the heap is similar to the reference type variable that we are familiar
with. It really does not have any type and thus looks like System.Object. To access its
specific values, we need to unbox it first. The WriteLine function does this internally.
The data type of the parameter that is to be boxed must be the same as that of the variable
whose address has been placed on the stack. We will subsequently explain these details

The above code is used to display the value of a static variable. The .cctor function
initializes the static variable to a value of 10. Then, the string {0} is stored on the stack.
The function ldsldfa loads the address of a static variable of a certain data type on the
stack. Then, as usual, box takes over. The explanation regarding the functionality of ‘box‘
given above is relevant here also.

Static variables in IL work in the same way as instance variables. The only difference is in
the fact that they have their own set of instructions. Instructions like box need a memory
location on the stack without discriminating(有差别的) between static and instance variables.

The only variation that we indulged in from the earlier program is that we have removed
the static constructor. All static variables and instance variables get initialized internally to
ZERO. Thus, IL does not generate any error. Internally, even before the static constructor
gets called, the field i is assigned an initial value of ZERO

We have initialised the local i to a value of 10. This cannot be done in the constructor since
the variable i has been created on the stack. Then, stloc.0 has been used to assign the
value of 10 to V_0. Thereafter, ldloc.0 has been ustilised to place the variable V_0 on the
stack, so that it is available to the WriteLine function.
The Writeline function thereafter displays the value on the screen. A field and a local
behave in a similar manner, except that they use separate sets of instructions.

All local variables have to be initialised, or else, the compiler will generate an unintelligible
error message. Here, even though we have eliminated the ldc and stloc instructions, no
error is generated at runtime. Instead, a very large number is displayed.
The variable V_0 has not been initialised to any value. It was created on the stack and
contained whatever value was available at the memory location assigned to it. On your
machine, the output will be very different than ours.
In a similar situation, the C# compiler will give you an error and not allow you to proceed
further, because the variable has not been initialized. IL, on the other hand, is a strange
kettle of fish. It is much more lenient in its outlook. It does very few error or sanity checks
on the source code. This has its drawback, maening, the programmer has to be much more
responsible and careful while using IL.

  

In the above example, a static variable has been initialised inside a function and not at the
time of its creation, as seen earlier. The function vijay calls the code present in the static
constructor.
The process given above is the only way to initialize a static or an instance variable.

The above program demonstrates as to how we can call a function with a single parameter.
The rules for placing parameters on the stack are similar to those for the WriteLine
function.
Now let us comprehend as to how a function receives parameters from the stack.
We begin by stating the data type and parameter name in the function declaration. This is
similar to the workings in C#.
Next, we use the instruction ldarga.s to load the address of the parameter i, onto the stack.
box will then convert the value type of this objct into object type and finally WriteLine
function uses these values to display the output on the screen.

  

In the above example, we have converted an int into an object because, the WriteLine
function requires the parameter to be of this data type.
The only method of achieving this conversion is by using the box instruction. The box
instruction converts an int into an object.
In the function abc, we accept a System.Object and we use the instruction ldarg and not
ldarga. The reason being, we require the value of the parameter and not its address. The
dot after the name signifies the parameter number. In order to place the values of
parameters on the stack, a new instruction is required.
Thus, IL handles locals, fields and parameters with their own set of instructions.

Functions return values. Here, a static function abc has been called. We know from the
function‘s signature that it returns an int. Return values are stored on the stack.
Thus, the stloc.1 instruction picks up the value on the stack and places it in the local V_1.
In this specific case, it is the return value of the function.
Newobj is also like a function. It returns an object which, in our case, is an instance of the
class zzz, and puts it on the stack.

The stloc instruction has been used repeatedly to initialize all our local variables. Just to
refresh your memory, ldloc does the reverse of this process.
A function has to just place a value on the stack using the trustworthy ldc and then cease
execution using the ret instruction.
Thus, the stack has a dual role to play:
? It is used to place values on the stack.
? It receives the return values of the functions

The only innovation and novelty that has been introduced in the above example is that the
return value of the function abc has been stored in an instance variable.
? Stloc assigns the value on the stack to a local variable.
? Ldloc, on the other hand, places the value of a local variable on the stack.
It is not understood as to why the object that looks like zzz has to be put on the stack
again, especially since abc is a static function and not an instance function. Mind you,
static functions are not passed the this pointer on the stack.
Thereafter, the function abc is called, which places the value 20 on the stack. The
instruction stfld picks up the value 20 from the stack, and initializes the instance variable i
with this value.
Local and instance variables are handled in a similar manner except that, the instructions
for their initialization are different.
The instruction ldfld does the reverse of what stfld does. It places the value of an instance
variable on the stack to make it available for the WriteLine function.

时间: 2024-11-07 14:31:16

C# to IL 2 IL Basics(IL基础)的相关文章

3.1 Templates -- Handlerbars Basics(Handlerbars基础知识)

一.简介 Ember.js使用Handlerbars模板库来强化你的应用程序的用户界面.它就像普通的HTML,但也给你嵌入表达式去改变现实的内容. Ember使用Handlerbars并且用许多新特性去扩展它.对于描述你app的用户界面来说,它可以帮助你考虑你的Handlerbars模板作为HTML-like DSL.而且,一旦你告诉Ember.js呈现在屏幕上呈现一个给定的模板,你不需要为了确保它保持最新去屑任何额外的代码. 二.Defining Templates 你需要做的第一件事技术改变

认识IL代码---从开始到现在 <第二篇>

·IL代码分析方法 ·IL命令解析 ·.NET学习方法论 1.引言 自从『你必须知道.NET』系列开篇以来,受到大家很多的关注和支持,给予了anytao巨大的鼓励和动力.俱往昔,我发现很多的园友都把目光和焦点注意在如何理解IL代码这个问题上.对我来说,这真是个莫大的好消息,因为很明显我们的思路慢慢的从应用向底层发生着转变,技巧性的东西是一个方面的积累,底层的探索在我认为也是必不可少的修炼.如果我们选择了来关注这项修炼,那么我们就应该选择如何来着手这项修炼,首先关注anytao的『你必须知道的.N

认识元数据和IL(中)<第四篇>

书接上回[第二十四回:认识元数据和IL(上)],我们对PE文件.程序集.托管模块,这些概念与元数据.IL的关系进行了必要的铺垫,同时顺便熟悉了以ILDASM工具进行反编译的基本方法认知,下面是时候来了解什么是元数据,什么是IL这个话题了,我们继续. 很早就有说说Metadata(元数据)和IL(中间语言)的想法了,一直在这篇开始才算脚踏实地的对这两个阶级兄弟投去些细关怀,虽然来得没有<第一回:恩怨情仇:is和as>那么迅速,但是Metadata和IL却是绝对重量级的内容,值得我们在任何时间关注

《你必须知道的.NET》读书笔记:从Hello World认识IL

通用的语言基础是.NET运行的基础,当我们对程序运行的结果有异议的时候,如何透过本质看表面,需要我们从底层来入手探索,这时候,IL便是我们必须知道的基础. 一.IL基础概念 1.1 什么是IL? IL是.NET框架中间语言(Intermediate Language)的缩写.使用.NET框架提供的编译器可以直接将源程序编译为.exe或.dll文件,但此时编译出来的程序代码并不是CPU能直接执行的机器代码,而是一种中间语言IL(Intermediate Language)的代码. 1.2 为何要了

IL 学习笔记

先上几篇博客链接: 一步步教你读懂NET中IL(图文详解) C#基础之IL 详解.NET IL代码 C# IL DASM 使用 你必须知道的.NET <C# to IL>.<Expert .NET 2.0 IL Assembler>等书籍的翻译博客

IL初步了解

一.概述: 近来也是在看AOP方面的东西,了解到Emit可以实现.之前对Emit的了解也就是停留在Reflector针对方法反编译出来的部分指令.就用这次机会学习下Emit也用这篇随笔记录下学习的过程.某些我也不了解的地方也希望各位了解的朋友指导下. 学习前可以先了解下Opcodes 二.工具 1.vs2015 2..NET Reflector 9.0 三.入门示例 1.输出Hello World C#代码 static void Main(string[] args) { Console.Wr

OpenMAX IL介绍与其体系

1  OpenMAX IL介绍与其体系 这一部分的文档描写叙述 OpenMAX IL的特性与体系. 1.1  OpenMAX IL 简述 OpenMAX IL 软件接口层定义了一套API,用于訪问系统中的组件.OpenMAX IL软件层的目的:能够对系统中的组件採用不同的初始化值和不同的命令集合:同一时候还能提供一套统一的命令集合和方法来构建和销毁组件. 1.1.1  体系概览 考虑一个系统要实现四个多媒体处理功能模块.分别标记为 F1, F2, F3, 和F4.这四个功能模块中的不论什么一个都

浅析.NET IL代码

一.前言 IL是什么? Intermediate Language (IL)微软中间语言 C#代码编译过程? C#源代码通过LC转为IL代码,IL主要包含一些元数据和中间语言指令: JIT编译器把IL代码转为机器识别的机器代码.如下图 语言编译器:无论是VB code还是C# code都会被Language Compiler转换为MSIL MSIL的作用:MSIL包含一些元数据和中间语言指令 JIT编译器的作用:根据系统环境将MSIL中间语言指令转换为机器码 为什么ASP.NET网站第一次运行时

在Visual Studio里配置及查看IL(转载)

原文地址:http://www.myext.cn/other/a_25162.html 在之前的版本VS2010中,在Tools下有IL Disassembler(IL中间语言查看器),但是我想直接集成在VS2012里使用,方法如下: 1.选择 工具/外部工具,打开外部工具 2.点击右侧的 “添加”按钮,填写相关的参数 标题:ILDASM 命令:就是ildasm.exe的安装路径,根据情况选择自己磁盘上的路径 参数:注意这里需要选目标文件路径 初始目录:目标文件目录 3.添加完之后,可以在工具选