4.7.6 Compaction of LR Parsing Tables

4.7.6 Compaction of LR Parsing Tables

A typical programming language grammar with 50 to 100 terminals and 100 productions may have an LALR parsing table with several hundred states. The action function may easily have 20,000 entries, each requiring at least 8 bits to encode. On small devices, a more efficient encoding than a two-dimensional array may be important. We shall mention briefly a few techniques that have been used to compress the ACTION and GOTO fields of an LR parsing table.

One useful technique for compacting the action field is to recognize that usually many rows of the action table are identical. For example, in Fig. 4.42, states 0 and 3 have identical action entries, and so do 2 and 6. We can therefore save considerable space, at little cost in time, if we create a pointer for each state into a one-dimensional array. Pointers for states with the same actions point to the same location. To access information from this array, we assign each terminal a number from zero to one less than the number of terminals, and we use this integer as an offset from the pointer value for each state. In a given state, the parsing action for the ith terminal will be found i locations past the pointer value for that state.

Further space efficiency can be achieved at the expense of a somewhat slower parser by creating a list for the actions of each state. The list consists of (terminal-symbol, action) pairs. The most frequent action for a state can be placed at the end of the list, and in place of a terminal we may use the notation “any,” meaning that if the current input symbol has not been found so far on the list, we should do that action no matter what the input is. Moreover, error entries can safely be replaced by reduce actions, for further uniformity along a row. The errors will be detected later, before a shift move.

Example 4.65: Consider the parsing table of Fig. 4.37. First, note that the actions for states 0, 4, 6, and 7 agree. We can represent them all by the list


SYMBOL


ACTION


id


s5


(


s4


any


error

State 1 has a similar list:


SYMBOL


ACTION


+


s6


$


acc


any


error

In state 2, we can replace the error entries by r2, so reduction by production 2 will occur on any input but *. Thus the list for state 2 is


SYMBOL


ACTION


*


s7


any


r2

State 3 has only error and r4 entries. We can replace the former by the latter, so the list for state 3 consists of only the pair (any, r4). States 5, 10, and 11 can be treated similarly. The list for state 8 is


SYMBOL


ACTION


+


s6


)


s11


any


error

and for state 9


SYMBOL


ACTION


*


S7


any


R1

We can also encode the GOTO table by a list, but here it app ears more efficient to make a list of pairs for each nonterminal A. Each pair on the list for A is of the form (currentState, nextState), indicating

GOTO [currentState, A] = nextState

This technique is useful because there tend to be rather few states in any one column of the GOTO table. The reason is that the GOTO on nonterminal A can only be a state derivable from a set of items in which some items have A immediately to the left of a dot. No set has items with X and Y immediately to the left of a dot if X ≠ Y. Thus, each state app ears in at most one GOTO column.

For more space reduction, we note that the error entries in the goto table are never consulted. We can therefore replace each error entry by the most common non-error entry in its column. This entry becomes the default; it is represented in the list for each column by one pair with any in place of currentState.

Example 4.66: Consider Fig. 4.37 again. The column for F has entry 10 for state 7, and all other entries are either 3 or error. We may replace error by 3 and create for column F the list


CURRENTSTATE


NEXTSTATE


7


10


any


3

Similarly, a suitable list for column T is


CURRENTSTATE


NEXTSTATE


6


9


any


2

For column E we may choose either 1 or 8 to be the default; two entries are necessary in either case. For example, we might create for column E the list


CURRENTSTATE


NEXTSTATE


4


8


any


1

This space savings in these small examples may be misleading, because the total number of entries in the lists created in this example and the previous one together with the pointers from states to action lists and from nonterminals to next-state lists, result in unimpressive space savings over the matrix implementation of Fig. 4.37. For practical grammars, the space needed for the list representation is typically less than ten percent of that needed for the matrix representation. The table-compression methods for finite automata that were discussed in Section 3.9.8 can also be used to represent LR parsing tables.

时间: 2024-10-27 04:26:52

4.7.6 Compaction of LR Parsing Tables的相关文章

4.7.3 Canonical LR(1) Parsing Tables

4.7.3 Canonical LR(1) Parsing Tables We now give the rules for constructing the LR(1) ACTION and GOTO functions from the sets of LR(1) items. These functions are represented by a table, as before. The only difference is in the values of the entries.

4.7.5 Efficient Construction of LALR Parsing Tables

4.7.5 Efficient Construction of LALR Parsing Tables There are several modifications we can make to Algorithm 4.59 to avoid constructing the full collection of sets of LR(1) items in the process of creating an LALR(1) parsing table. First, we can repr

4.7.4 Constructing LALR Parsing Tables

We now introduce our last parser construction method, the LALR (lookahead-LR) technique. This method is often used in practice, because the tables obtained by it are considerably smaller than the canonical LR tables, yet most common syntactic constru

(转)Understanding C parsers generated by GNU Bison

原文链接:https://www.cs.uic.edu/~spopuri/cparser.html Satya Kiran PopuriGraduate StudentUniversity of Illinois at ChicagoChicago, IL 60607spopur2 [at] uic [dot] eduWed Sep 13 12:24:25 CDT 2006 Table of Contents Introduction Prerequisites The LR Parser An

语法分析原理 - Parsing

简介 Parsing方法分类: Universal: Cocke-Younger-Kasami Algorithm and Earley's Algorithm. Top-down: LL(k). Bottom-up: LR(k). 错误恢复模式: Panic-Mode Recovery:  Phrase-Level Recovery: Error Production: Global Correction: 上下文无关文法: 终结符(terminal):组成串的基本符号. 非终结符(nonte

LR如何利用siteScope监控MySQL性能

本次实验,是在自己的电脑上使用APMServ5.2.6部署Discuz2.X论坛下,对该论坛的数据库MySQL5.1进行性能测试的,下面讲述LoadRunner在设计场景时,如何利用siteScope工具监控MySQL数据库性能: 一.在网上下载siteScope 我在网上搜了很久,发现siteScope软件的下载资源很少,毕竟它是收费的,想找个破解版,不过还是有的,哈哈...但是版本旧了点siteScope7.9.5 下载地址:http://download.csdn.net/detail/y

Android 编译错误——布局 Error parsing XML: not well-formed (invalid token)

在修改了Android布局文件后,编译出现Error parsing XML: not well-formed (invalid token). 首先先排查xml文件的编码格式是否为UTF-8, <?xml version="1.0" encoding="utf-8"?> ,注意,从别处copy的要留意编码格式! 还有各个标签是否有遗漏,把鼠标箭头移到出错误的layout上 点击鼠标右键选择Source然后再选Format. 都没有问题,结果发现报错处(

Replicate Partitioned Tables and Indexes

在初始化subscriber时,Replication能够将分区table 和 分区index的Partition function 和 Partition schema 复制到 subscriber中,这样,table 和 Index 以相同的Partition schema创建.但是,replication 不会将 Partition function 和 Partition schema的更新同步到subscriber,即只在初始subscriber时,复制一次 Partition fun

SVM与LR的比较

两种方法都是常见的分类算法,从目标函数来看,区别在于逻辑回归采用的是logistical loss,svm采用的是hinge loss.这两个损失函数的目的都是增加对分类影响较大的数据点的权重,减少与分类关系较小的数据点的权重.SVM的处理方法是只考虑support vectors,也就是和分类最相关的少数点,去学习分类器.而逻辑回归通过非线性映射,大大减小了离分类平面较远的点的权重,相对提升了与分类最相关的数据点的权重.两者的根本目的都是一样的.此外,根据需要,两个方法都可以增加不同的正则化项