为了达到第一条和最后一条的目标,Solidity汇编语言提供了高层级的组件比如,for循环,switch语句和函数调用。这样的话,可以不直接使用SWAP,DUP,JUMP,JUMPI语句,因为前两个有混淆的数据流,后两个有混淆的控制流。此外,函数形式的语句如mul(add(x, y), 7)比纯的指令码的形式7 y x add num更加可读。
contract C {
function f(uint x) returns (uint y) {
y = 1;
for (uint i = 0; i < x; i++)
y = 2 * y;
mstore(0x40, 0x60) // store the "free memory pointer"
// function dispatcher
switch div(calldataload(0), exp(2, 226))
case 0xb3de648b {
let (r) = f(calldataload(4))
let ret := $allocate(0x20)
mstore(ret, r)
return(ret, 0x20)
default { revert(0, 0) }
// memory allocator
function $allocate(size) -> pos {
pos := mload(0x40)
mstore(0x40, add(pos, size))
// the contract function
function f(x) -> y {
y := 1
for { let i := 0 } lt(i, x) { i := add(i, 1) } {
y := mul(2, y)
mstore(0x40, 0x60)
let $0 := div(calldataload(0), exp(2, 226))
jumpi($case1, eq($0, 0xb3de648b))
// the function call - we put return label and arguments on the stack
$ret1 calldataload(4) jump(f)
// This is unreachable code. Opcodes are added that mirror the
// effect of the function on the stack height: Arguments are
// removed and return values are introduced.
pop pop
let r := 0
$ret1: // the actual return point
$ret2 0x20 jump($allocate)
pop pop let ret := 0
mstore(ret, r)
return(ret, 0x20)
// although it is useless, the jump is automatically inserted,
// since the desugaring process is a purely syntactic operation that
// does not analyze control-flow
revert(0, 0)
// we jump over the unreachable code that introduces the function arguments
let $retpos := 0 let size := 0
// output variables live in the same scope as the arguments and is
// actually allocated.
let pos := 0
pos := mload(0x40)
mstore(0x40, add(pos, size))
// This code replaces the arguments by the return values and jumps back.
swap1 pop swap1 jump
// Again unreachable code that corrects stack height.
0 0
let $retpos := 0 let x := 0
let y := 0
let i := 0
jumpi($for_end, iszero(lt(i, x)))
y := mul(2, y)
{ i := add(i, 1) }
} // Here, a pop instruction will be inserted for i
swap1 pop swap1 jump
0 0
- 将字节流转为符号流,去掉其中的C++风格的注释(一种特殊的源代码引用的注释,这里不打算深入讨论)。
- 将符号流转为下述定义的语法结构的AST。
- 注册块中定义的标识符,标注从哪里开始(根据AST节点的注解),变量可以被访问。
空格用于分隔标记,它由空格,制表符和换行符组成。 注释是常规的JavaScript / C ++注释,并以与Whitespace相同的方式进行解释。
语法:AssemblyBlock = ‘{‘ AssemblyItem* ‘}‘ AssemblyItem = Identifier | AssemblyBlock | FunctionalAssemblyExpression | AssemblyLocalDefinition | FunctionalAssemblyAssignment | AssemblyAssignment | LabelDefinition | AssemblySwitch | AssemblyFunctionDefinition | AssemblyFor | ‘break‘ | ‘continue‘ | SubAssembly | ‘dataSize‘ ‘(‘ Identifier ‘)‘ | LinkerSymbol | ‘errorLabel‘ | ‘bytecodeSize‘ | NumberLiteral | StringLiteral | HexLiteral Identifier = [a-zA-Z_$] [a-zA-Z_0-9]* FunctionalAssemblyExpression = Identifier ‘(‘ ( AssemblyItem ( ‘,‘ AssemblyItem )* )? ‘)‘ AssemblyLocalDefinition = ‘let‘ IdentifierOrList ‘:=‘ FunctionalAssemblyExpression FunctionalAssemblyAssignment = IdentifierOrList ‘:=‘ FunctionalAssemblyExpression IdentifierOrList = Identifier | ‘(‘ IdentifierList ‘)‘ IdentifierList = Identifier ( ‘,‘ Identifier)* AssemblyAssignment = ‘=:‘ Identifier LabelDefinition = Identifier ‘:‘ AssemblySwitch = ‘switch‘ FunctionalAssemblyExpression AssemblyCase* ( ‘default‘ AssemblyBlock )? AssemblyCase = ‘case‘ FunctionalAssemblyExpression AssemblyBlock AssemblyFunctionDefinition = ‘function‘ Identifier ‘(‘ IdentifierList? ‘)‘ ( ‘->‘ ‘(‘ IdentifierList ‘)‘ )? AssemblyBlock AssemblyFor = ‘for‘ ( AssemblyBlock | FunctionalAssemblyExpression) FunctionalAssemblyExpression ( AssemblyBlock | FunctionalAssemblyExpression) AssemblyBlock SubAssembly = ‘assembly‘ Identifier AssemblyBlock LinkerSymbol = ‘linkerSymbol‘ ‘(‘ StringLiteral ‘)‘ NumberLiteral = HexNumber | DecimalNumber HexLiteral = ‘hex‘ (‘"‘ ([0-9a-fA-F]{2})* ‘"‘ | ‘\‘‘ ([0-9a-fA-F]{2})* ‘\‘‘) StringLiteral = ‘"‘ ([^"\r\n\\] | ‘\\‘ .)* ‘"‘ HexNumber = ‘0x‘ [0-9a-fA-F]+ DecimalNumber = [0-9]+
一个AST转换,移除其中的for,switch和函数构建。结果仍由同一个解析器,但它不确定使用什么构造。如果添加仅跳转到并且不继续的jumpdests,则添加有关堆栈内容的信息,除非没有局部变量访问到外部作用域或栈高度与上一条指令相同。伪代码如下:desugar item: AST -> AST = match item { AssemblyFunctionDefinition(‘function‘ name ‘(‘ arg1, ..., argn ‘)‘ ‘->‘ ( ‘(‘ ret1, ..., retm ‘)‘ body) -> <name>: { jump($<name>_start) let $retPC := 0 let argn := 0 ... let arg1 := 0 $<name>_start: let ret1 := 0 ... let retm := 0 { desugar(body) } swap and pop items so that only ret1, ... retm, $retPC are left on the stack jump 0 (1 + n times) to compensate removal of arg1, ..., argn and $retPC } AssemblyFor(‘for‘ { init } condition post body) -> { init // cannot be its own block because we want variable scope to extend into the body // find I such that there are no labels $forI_* $forI_begin: jumpi($forI_end, iszero(condition)) { body } $forI_continue: { post } jump($forI_begin) $forI_end: } ‘break‘ -> { // find nearest enclosing scope with label $forI_end pop all local variables that are defined at the current point but not at $forI_end jump($forI_end) 0 (as many as variables were removed above) } ‘continue‘ -> { // find nearest enclosing scope with label $forI_continue pop all local variables that are defined at the current point but not at $forI_continue jump($forI_continue) 0 (as many as variables were removed above) } AssemblySwitch(switch condition cases ( default: defaultBlock )? ) -> { // find I such that there is no $switchI* label or variable let $switchI_value := condition for each of cases match { case val: -> jumpi($switchI_caseJ, eq($switchI_value, val)) } if default block present: -> { defaultBlock jump($switchI_end) } for each of cases match { case val: { body } -> $switchI_caseJ: { body jump($switchI_end) } } $switchI_end: } FunctionalAssemblyExpression( identifier(arg1, arg2, ..., argn) ) -> { if identifier is function <name> with n args and m ret values -> { // find I such that $funcallI_* does not exist $funcallI_return argn ... arg2 arg1 jump(<name>) pop (n + 1 times) if the current context is `let (id1, ..., idm) := f(...)` -> let id1 := 0 ... let idm := 0 $funcallI_return: else -> 0 (m times) $funcallI_return: turn the functional expression that leads to the function call into a statement stream } else -> desugar(children of node) } default node -> desugar(children of node) }
伪代码:codegen item: AST -> opcode_stream = match item { AssemblyBlock({ items }) -> join(codegen(item) for item in items) if last generated opcode has continuing control flow: POP for all local variables registered at the block (including variables introduced by labels) warn if the stack height at this point is not the same as at the start of the block Identifier(id) -> lookup id in the syntactic stack of blocks match type of id Local Variable -> DUPi where i = 1 + stack_height - stack_height_of_identifier(id) Label -> // reference to be resolved during bytecode generation PUSH<bytecode position of label> SubAssembly -> PUSH<bytecode position of subassembly data> FunctionalAssemblyExpression(id ( arguments ) ) -> join(codegen(arg) for arg in arguments.reversed()) id (which has to be an opcode, might be a function name later) AssemblyLocalDefinition(let (id1, ..., idn) := expr) -> register identifiers id1, ..., idn as locals in current block at current stack height codegen(expr) - assert that expr returns n items to the stack FunctionalAssemblyAssignment((id1, ..., idn) := expr) -> lookup id1, ..., idn in the syntactic stack of blocks, assert that they are variables codegen(expr) for j = n, ..., i: SWAPi where i = 1 + stack_height - stack_height_of_identifier(idj) POP AssemblyAssignment(=: id) -> look up id in the syntactic stack of blocks, assert that it is a variable SWAPi where i = 1 + stack_height - stack_height_of_identifier(id) POP LabelDefinition(name:) -> JUMPDEST NumberLiteral(num) -> PUSH<num interpreted as decimal and right-aligned> HexLiteral(lit) -> PUSH32<lit interpreted as hex and left-aligned> StringLiteral(lit) -> PUSH32<lit utf-8 encoded and left-aligned> SubAssembly(assembly <name> block) -> append codegen(block) at the end of the code dataSize(<name>) -> assert that <name> is a subassembly -> PUSH32<size of code generated from subassembly <name>> linkerSymbol(<lit>) -> PUSH32<zeros> and append position to linker table }