1. 抽象语法和具体语法
抽象语法:每个内部结点代表一个运算符,该结点的子结点表示这个运算符的分量。比如表达式 9 -5 + 2,其抽象语法树为
expr -> expr1 + term {print(‘+‘)} expr -> expr1 - term {print(‘-‘)} expr -> term term -> 0 {print(‘0‘)} term -> 1 {print(‘1‘)} ... term -> 9 {print(‘9‘)}
这个文法包含了左递归,消除方法很简单,转为右递归即可,即产生式 A -> Aα | Aβ | γ 转换为
A -> γR
R -> αR | βR | ε
expr -> term rest rest -> + term { print(‘+‘) } rest | - term { print(‘-‘) } rest | ε term -> 0 { print(‘0‘) } | 1 { print(‘1‘) } ... | 9 {print(‘9‘) }
如果这个动作放在rest后面,翻译就不正确了,比如对于表达式 9 - 5 + 2,9为一个term,此时执行动作{print(‘9‘)},然后 “-” 匹配rest的第二个产生式,然后是5,这个term执行动作{print(‘5‘)},然后又是rest非终结符,此时5后面遇到“+”,匹配rest的第一个产生式,然后遇到字符 2,执行{print(‘2‘)},然后需要匹配rest,由于2后面没有字符了,故匹配rest的第三个产生式,没有任何动作,此时返回到rest的第一个产生式,然后执行rest后面的动作{print(‘+‘)},然后再返回到rest的第二个产生式,执行动作{print(‘-‘)},最终翻译结果为 952+-,显然与正确答案95-2+不一致。
void expr() { term(); rest(); } void rest() { if (lookahead == ‘+‘) { match(‘+‘); term(); print(‘+‘); rest(); } else if (lookahead == ‘-‘) { match(‘-‘); term(); print(‘-‘); rest(); } else {} } void term() { if (IsNumber(lookahead)) { t = lookahead; match(lookahead); print(t); } else report ("语法错误“); }
2. 翻译器的简化
void rest() { while(true) { if(lookahead == ‘+‘) { match(‘+‘); term(); print(‘+‘); continue; } else if(lookahead == ‘-‘) { match(‘-‘); term(); print(‘-‘); continue; } break; } }
/* A simple translator written in C# */ using System; using System.IO; namespace CompileDemo { class Parser { static int lookahead; public Parser() { lookahead = Console.Read(); } public void expr() { term(); while(true) { if (lookahead == ‘+‘) { match(‘+‘); term(); Console.Write(‘+‘); } else if (lookahead == ‘-‘) { match(‘-‘); term(); Console.Write(‘-‘); } else return; } } void term() { if (char.IsDigit((char)lookahead)) { Console.Write((char)lookahead); match(lookahead); } else throw new Exception("syntax error"); } void match(int t) { if (lookahead == t) lookahead = Console.Read(); else throw new Exception("syntax error"); } } class Program { static void Main(string[] args) { Console.Write("please input an expression composed with numbers and operating chars...\n"); var parser = new Parser(); parser.expr(); Console.Write(‘\n‘); } } }
expr -> expr + term {print(‘+‘) } | expr - term { print(‘-‘) } | term term -> term * factor { print(‘*‘) } | term / factor { print(‘/‘) } | factor factor -> (expr) | num { print (num.value) } | id { print(id.lexeme) }
1. 常量
在输入流中出现一个数位序列时,词法分析器将向语法分析器传送一个词法单元,这个词法单元包含终结符(比如num)以及根据这些数位计算得到的整形属性值,比如输入 31+28+59 被转换为
if (peek holds a digit) { v = 0; do { v = v*10 + integer value of digit peek; peek = next input character; }while(peek holds a digit); return token<num, v>;
2. 关键字和标识符
假设将标识符用id终结符表示,则输入如下时,count = count + increment; 语法分析器处理的是终结符序列 id = id + id; 用一个属性保存id的词素,则这个终结符序列写成元组形式为
if(peek holds a alphebet) { while (IsAlphaOrDigit(c = IO.Read())) buffer.Add(c); // read a character into buffer string s = buffer.ToString(); w =words.get(s); if(w != null) return w; // reserved word or cached user id else { words.Add(s, <id, s>); return <id, s>; } }
/* A lexer written in C# */ class Lexer { public int line = 1; private char peek = ‘ ‘; private Hashtable words = new Hashtable(); void reserve(Word w) { words.Add(w.lexeme, w); } public Lexer() { reserve(new Word(Tag.TRUE, "true")); reserve(new Word(Tag.FALSE, "false")); } public Token Scan() { for(;;peek = (char)Console.Read()) { if (peek == ‘ ‘ || peek == ‘\t‘) continue; else if (peek == ‘\n‘) line++; else break; } if(char.IsDigit(peek)) { int v = 0; do { v = 10 * v + peek - ‘1‘ + 1; peek = (char)Console.Read(); } while (char.IsDigit(peek)); return new Num(v); } if(char.IsLetter(peek)) { StringBuilder sb = new StringBuilder(); do { sb.Append(peek); peek = (char)Console.Read(); } while (char.IsLetterOrDigit(peek)); string s = sb.ToString(); Word w = null; if (words.Contains(s)) w = (Word)words[s]; if (w != null) // reserved word or cached ID return w; w = new Word(Tag.ID, s); words.Add(s, w); return w; } Token t = new Token(peek); peek = ‘ ‘; return t; } } class Token { public readonly int tag; public Token(int t) { tag = t; } } class Tag { public readonly static int NUM = 256, ID = 257, TRUE = 258, FALSE = 259; } class Num : Token { public readonly int value; public Num(int v) : base(Tag.NUM) { value = v; } } class Word : Token { public readonly string lexeme; public Word(int t, string s) : base(t) { lexeme = s; } }
(global var) int w; ... int x; int y; // block 1 int w; bool y; int z; // block 2 ...w, y, z, x... w, x, y...
class Env { private Hashtable table; protected Env prev; public Env(Env p) { table = new Hashtable(); prev = p; } public void Put(string s, Symbol symbol) { table.Add(s, symbol); } public Symbol Get(string s) { // traverse from inner to outer for(Env e = this; e!= null; e=e.prev) { if (e.table.Contains(s)) return (Symbol)(e.table[s]); } return null; } } class Symbol { public string type { get; set; } }
program -> {top = null;} block block -> ‘{‘ {saved = top; top = new Env(top); print("{"); } decls stmts ‘}‘ {top = saved; print("}"); } decls -> decls decl | ε decl -> type id; {s = new Symbol(); s.type = type.lexeme; top.Put(id.lexeme, s);} stmts -> stmts stmt | ε stmt -> block | factor; { print(";");}
factor -> id { s = top.Get(id.lexeme); print(id.lexeme); print(":"); print(s.type); }
1. 语法树的构造
前面介绍了抽象语法树,即由运算符作为结点,其子结点为这个运算符的分量。比如 9 - 5。再比如一个while语句
while (expr) stmt
new while(x, y)
program -> block {return block.n} block -> ‘{‘ stmts ‘}‘ {return block.n = stmts.n;} stmts -> stmts1 stmt {stmts.n = new Seq(stmts1.n, stmt.n);} | ε { stmts.n = null; } stmt -> expr; {stmt.n = new Eval(expr.n);} | if(expr) stmt1 {stmt.n = new If(expr.n, stmt1.n);} | while(expr)stmt1 {stmt.n = new While(expr.n, stmt1.n;} | do stmt1 while(expr); {stmt.n = new Do(stmt1.n, expr.n);} | block {stmt.n = block.n} expr -> rel = expr1 {expr.n = new Assign(‘=‘, rel.n, expr1.n);} // right associative | rel { expr.n = rel.n;} rel -> rel1 < add { expr.n = new Rel(‘<‘, rel1.n, add.n);} | rel1 <= add { expr.n = new Rel(‘≤‘, rel1.n, add.n);} | add {rel.n = add.n;} add -> add1 + term {add.n = new Op(‘+‘, add1.n, term.n);} | term { add.n = term.n;} term -> term1*factor {term.n = new Op(‘*‘, term1.n, factor.n);} | factor {term.n = factor.n;} factor -> (expr) { factor.n = expr.n;} | num { factor.n = new Num(num.value);}
2. 静态检查
赋值表达式的左部和右部含义不一样,如 i = i + 1; 表达式左部是用来存放该值的存储位置,右部描述了一个整数值。静态检查要求赋值表达式左部是一个左值(存储位置)。
if(expr) stmt
if(E1.type == E2.type) E.type = boolean; else error;
3. 语句翻译
对语句 if expr then stmt1 的翻译,
对 exrp求值并保存到x中if False x goto afterstmt1的代码after: ...
class If : Stmt { Expr E; Stmt S; public If(Expr x, Stmt y) { E = x; S = y; after = newlabel(); } public void gen() { Expr n = E.rvalue(); // calculate right value of expression E emit("ifFalse " + n.ToString() + " goto " + after); S.gen(); emit(after + ":"); } }
If类中,E, S分别表示if语句的表达式expr以及语句stmt。在源程序的整个抽象语法树构建完毕时,函数gen在此抽象语法树的根结点处被调用。
一个结点x的类为Expr,其运算符为op,并将运算结果值存放在由编译器生成的临时名字(如t)中。故 i - j + k被翻译成
t1 = i - j t2 = t1 + k
对包含数组访问的情况如 2* a[i],翻译为
t1 = a [i] t2 = 2 * t1
Expr lvalue(x: Expr) { if(x is a Id node) return x; else if(x is a Access(y,z) node, and y is a Id node) { return new Access(y, rvalue(z)); } else error; }
Expr rvalue(x: Expr) { if(x is a Id or Constant node) return x; else if (x is a Op(op, y, z) or Rel(op, y, z) node) { t = temperary name; generate instruction strings of t = rvalue(y) op rvalue(z); return a new node of t; } else if(x is a Access(y,z) node) { t = temperary name; call lvalue(x), and return a Access(y, z‘) node;
generate instruction strings of Access(y, z‘); return a new node of t; } else if (x is a Assign(y, z) node) { z‘ = rvalue(z); generate instruction strings of lvalue(y) = z‘ return z‘; } }
1)如果表达式x是一个标识符或者一个常量,则直接返回x,如 5,则返回5, a, 则返回a;
2)如果表达式x是一个Op运算符(+、-、*、/等)或者Rel运算符(<, >, ==, <=, >=, !=等),则创建一个临时名字t,并对两个运算分量y,z分别求右值,然后用op运算符计算,将运算结果赋给t,返回t;
3)如果表达式x是一个数组访问,则创建一个临时名字t,对数组访问表达式x求其左值返回一个Access(y,z‘)结点,求x左值,是因为x是Access(y, z),而z是一个表达式,需要对z求右值,然后临时名字t被赋予整个数组访问的右值,返回t;
a[i] = 2*a[j-k]
t3 = j - k t2 = a[t3] t1 = 2*t2 a[i] = t1