Measuring Text Difficulty Using Parse-Tree Frequency

https://nlp.lab.arizona.edu/sites/nlp.lab.arizona.edu/files/Kauchak-Leroy-Hogue-JASIST-2017.pdf

In previous work, we conducted a preliminary corpus study of grammar frequency which showed that difficult texts use a wider variety of high-level grammatical structures (Kauchak et al., 2012). However, because of the large number of structural variations possible, no clear indication was found showing specific structures predominantly appearing in either easy or difficult documents.

In this work, we propose a much more fine-grained analysis. We propose a measure of text difficulty based on grammatical frequency and show how it can be used to identify sentences with difficult syntactic structures. In particular, the grammatical difficulty of a sentence is measured based on the frequency of occurrence of the top-level parse tree structure of the sentence in a large corpus

根据term familiarity创造了grammer familiarity的概念:

Grammar familiarity is measured as the frequency of the 3rd level sentence parse tree

实验:

将wiki根据第三等级的parse tree分成了11 bins,计算每个bin出现的频率;

然后每个bin随机挑选20个句子(抛去句子长度和ter familiarity的影响),招募了三十几个人对句子评分(5 points)以及完形填空;

结果发现,即使表现看不出区别的句子,3rd parse tree出现频率越高的句子,评分越简单,所花时间越短

假设:

examine how grammatical frequency impacts the difficulty of a sentence and introduce a new measure of sentence-level text difficulty based on the grammatical structure of the sentence.

句子难度氛围perceived和actual

Our work here makes a step towards better simplification tools by 1) introducing a sentence-level, data-driven approach for measuring the grammatical difficulty of a sentence and 2) specifically measuring the impact of this measure using both how difficult a sentence looks (perceived difficulty) as well as how difficult a sentence is to understand (actual difficulty).

实词和动词在简单文本中更普遍

simple texts use simpler words, fewer overall words and words that are more general (Coster & Kauchak, 2011; Napoles & Dredze, 2010; Zhu, Bernhard, & Gurevych, 2010). Certain types of words have also been found to be more prevalent in simpler texts including function words and verbs (Kauchak, Leroy, & Coster, 2012; Leroy & Endicott, 2011).

The Role of Syntax in Simplification

The syntax or grammar of a language dictates how words and phrases interact to form sentences。

splitting long sentences has been show to improve Cloze scores (Kandula, Curtis, & Zeng-Treitler, 2010) and additive and causal表原因或目的connectors were easier to fill in than adversative or sequential connectors转折或者表时间顺序的连词 (Goldman & Murray, 1992). It has been suggested that grammatical difficulty is particularly important for L2 learners since they are still trying to learn appropriate grammatical structures for the language (Callan & Eskenazi, 2007; Clahsen & Felser, 2006).

(注:LOGICAL CONNECTORS  https://staff.washington.edu/marynell/grammar/logicalconnectors.html)

简单文本更高比例的使用动词、function words和副词,难文本更高比例使用形容词和名词、更长的名词短语;医疗文本中,简单文本更容易使用主动语态;subject-verb-object versus object-subject-verb ordering也有区别

Some initial success has been achieved by automated simplification systems that perform syntactic transformations, 例如减少介词短语、不定式、改变动词时态

如何选择parse tree structure

We chose to focus on the 3rd level since it represents a compromise between generality and specificity.

45% of sentences in the corpus (2.47M) have unique 4th level parse tree structures, often because the 4th level regularly includes lexical components. 这样包含单词之后,很难泛华到其他句子

To remove anomalous data and likely misparses, we ignored any structure that had only been seen once among the 5.4 millions sentences. After filtering, this results in 139,969 unique 3rd level structures.

表一:

two sentences that have the same 3rd level structure, but that have varying frequency, ordered from most frequent to least. Because we focus on the high-level structure, the length of the sentences with the same structure also can vary widely

图2

grammatical frequency follows a Zipf – like distribution, with the most common structures occurring very frequently and many structures occurring infrequently

This approach for measuring the grammatical difficulty of text represents a generalized and datadriven approach that goes beyond specific, theory-based grammatical components of text difficult (e.g. active vs. passive voice, self-embedded clauses, etc. (Meyer & Rice, 1984)) and provides a generic framework for measuring grammatical difficulty.超越了基于理论的语法成分

评估:

To minimize confounding factors that might influence sentence difficulty we control for sentence length and term familiarity

1)We ranked the 139,939 unique 3rd level structures and divided them into 11 frequency bins.第一个bin占签1%频率的structure,后面是个依次占10%

2)Each of the 5.4 million Wikipedia sentences can be mapped to one of the 11 frequency bins and we selected a subset of these for our study.

3)只取长度均值附近的句子,假设句子长度服从均匀分布,去除六分之一最长的和最短的,还剩三分之二的句子

4)每个bin随机选20句,探究grammer frequency和句子长度term familiarity的关系:

考虑句子长度:在剩下的句子中按照句子长度分为3个等级,每个bin选择的10个属于最长的句子和10属于最短的句子

考虑term familiarity:在剩下的句子中选择Google web语料,计算句子单词的familiarity的均值,按照familiarity分为3个等级,每个bin选择的10个属于familiarity得分最高的句子和20个最低的

--》

This process resulted in a sample of 220 sentences in 11 frequency bins with each bin containing 5 long sentences with high familiarity, 5 long with low familiarity, 5 short with high familiarity, and 5 short with low familiarity

For each of the 220 sentences, we recruited 30 participants for a total of N=6,600 samples. To ensure the quality and accuracy of the data, participants were restricted to be within the United States and to have a previous approval rating of 95%.

众包:MTurk is a crowdsourcing tool where requesters can upload tasks to be accomplished by a set of workers for a fee.

结果:

A paired-samples t–test showed our two control variables to be effective, with length significantly different between short and long sentences (t(10) = -60.47, p < 0.001) and word frequency significantly different between the high and low group (t(10) = -38.47, p < 0.001).

1)实际难度:To measure actual difficulty (first dependent variable) we used a Cloze test. The basic Cloze test involves replacing every nth word in a text with a blank. Participants are then asked to fill in the blanks and are scored based on how many of their answers matched the original text (Taylor, 1953).

We employed a multiple-choice Cloze test. For each sentence, four nouns were randomly selected and replaced with blanks. For each sentence, we create five multiple-choice options containing the four removed words in different random orders, one of which is the correct ordering.

2)To measure perceived difficulty (second dependent variable), participants were asked to rate the sentences on a 5-point Likert scale with higher numbers representing more difficult sentences.

Each condition (11 x 2 x 2) had 5 sentences and for each sentence we gathered 30 responses, resulting in a dataset of N=6,600. T

An ANOVA shows these differences to be significant (F(10,6556)= 3.453, p < 0.001), for grammar frequency and sentence length, and (F(10,6556)= 1.870, p = 0.044), for grammar frequency and term familiarity. In addition, the interaction between all three variables is also significant (F(10, 6556) = 4.650, p < 0.001)

(注:方差分析(Analysis of Variance,简称ANOVA),又称“变异数分析”)

1、独立样本T检验一般仅仅比较两组数据有没有区别,区别的显著性,如比较两组人的身高,体重等等,而这两组一般都是独立的,没有联系的,只是比较这两组数据有没有统计学上的区别或差异。

2、单因素ANOVA也就是单因素方差分析,是用来研究一个控制变量的不同水平是否对观测变量产生了显著影响。说白了就是分析x的变化对y的影响的显著性,所以一般变量之间存在某种影响关系的,验证一种变量的变化对另一种变量的影响显著性的检验。一般的,方差分析都是配对的。

更少见的structure更难~one-tailed Pearson correlation coefficient:

To complete this analysis and understand the strength of the effect on actual difficulty, we calculated a one-tailed Pearson correlation coefficient between the grammar frequency and the actual difficulty (percentage correct) for both the raw scores and scores aggregated by frequency bin. There was a negative correlation between grammar frequency and the actual difficulty of the sentence (raw scores: N = 6,600, r = -0.053, p < 0.01; bin averages: N = 11, r = -0.596, p < 0.05) indicating that sentences that used less frequent structures were harder to understand.

在中等structure frequency的时候,长句和短句的准确率差不多:

In contrast to actual difficulty, we also find a main effect of the sentence length on perceived difficulty with longer sentences seen as more difficult (averaged 2.2) than the shorter sentences (averaged 2.0). Surprisingly, there was no effect of the average term frequency on perceived difficulty.

The effect of grammar frequency on perceived difficulty is smaller in shorter sentences and those with lower term frequency

Both high and low frequency sentences show a jump in difficulty, though it occurs earlier (bin 7) for low frequency sentences than for high frequency sentences (bin 8)

we found a significant correlation between how well readers performed on the Cloze test and how difficult they thought a sentence was. Lower accuracy correlated with higher difficulty scores (N = 11, r = -0.574, p < 0.05; N = 6600, r = -0.203, p < 0.01)

Actual and perceived difficulty as measured in our user study for the 220 sentences binned by the Flesch-Kincaid grade level:

即使fk公式来看220个句子的难度差别很小,但是perceived difficulty确实很大的

GRAMMAR FAMILIARITY AS AN ANALYSIS TOOL:

corpus:

Each of the texts were tokenized and split into sentences using the Stanford CoreNLP toolkit and then parsed using the Berkeley Parser (the same preprocessing as the frequency bins)

总结:

1、阐述了现有的研究上,actual readability和perceived difficulty的不对等

2、阐述了parse tree leve3 frequency和actual和perceived difficulty的相关性,有效性

3、在短句中actual difficulty受到grammer的影响很小,因为shorter sentences are easy to understand and any effect of grammar is difficult to detect (ceiling effect)

Similarly, in sentences with low term familiarity (i.e. more difficult words) the grammar familiarity doesn’t impact text difficulty since users are struggling with the lexical difficulty

However, in sentences with very familiar terms, which are easier to understand, grammar frequency does have an impact on actual difficulty; only in sentences where the words are more familiar does the grammatical frequency have a strong effect. Interestingly, there was very little impact overall of term frequency on actual difficulty.

Based on these observations, we hypothesize that there is a relation between grammatical frequency and term frequency. Future studies are required to fully validate these hypotheses. Our study has limitations. Text comprehension was measured with individual

原文地址:https://www.cnblogs.com/rosyYY/p/10523069.html

时间: 2024-11-04 16:47:29

Measuring Text Difficulty Using Parse-Tree Frequency的相关文章

构建ASP.NET MVC4+EF5+EasyUI+Unity2.x注入的后台管理系统(38)-Easyui-accordion+tree漂亮的菜单导航

本节主要知识点是easyui 的手风琴加树结构做菜单导航 有园友抱怨原来菜单非常难看,但是基于原有树形无限级别的设计,没有办法只能已树形展示 先来看原来的效果 改变后的效果,当然我已经做好了,最后只放出代码供大家参考,其实网上也有这方面的资料,但是不是很好用,我还是自己写了 改变后的效果 手风琴一直都是比较漂亮和受欢迎的,但是基于树多级别来说,做起来就比较麻烦,所以我这里也用了手风琴加树的模式来做 注:上面的图标都是乱添加的,并不代表意思 进入正文: 首先必须下载一些图标.可以自行百度网页小图标

五.Classification (I) – Tree, Lazy, and Probabilisti

五.Classification (I) – Tree, Lazy, and Probabilistic 五.分类(I)-树,延迟,和概率 In this chapter, we will cover the following recipes: 在本章中,我们将介绍下列菜谱:准备培训和测试数据集 1.Preparing the training and testing datasets Introduction 介绍 Classification is used to identify a c

20.easyUI之Tree

1.静态ul+li方式实现 1 <div data-options="region:'center',title:'FastQ'" 2 style="overflow: hidden;"> 3 <ul id="tt" class="easyui-tree"> 4 <li><span>中国</span> 5 <ul> 6 <li><span

ExtJS笔记 Tree

The Tree Panel Component is one of the most versatile Components in Ext JS and is an excellent tool for displaying heirarchical data in an application. Tree Panel extends from the same class as Grid Panel, so all of the benefits of Grid Panels - feat

JQuery EasyUi Tree获取所有checkbox选中节点的id和内容

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"><html>    <head>        <meta name="generator" content="HTML Tidy, see www.w3.org">      

UVA - 10410 Tree Reconstruction

Description You have just finished a compiler design homework question where you had to find the parse tree of an expression. Unfortunately you left your assignment in the library, but luckily your friend picked it up for you. Instead of e-mailing yo

A Complete Tutorial on Tree Based Modeling from Scratch (in R &amp; Python)

A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python) MACHINE LEARNING PYTHON R SHARE  MANISH SARASWAT, APRIL 12, 2016 / 52 Introduction Tree based learning algorithms are considered to be one of the best and mostly used supervised

MVC4中EasyUI Tree异步加载JSON数据生成树

  1,首先构造tree接受的格式化数据结构MODEL /// <summary> /// 定义EasyUI树的相关数据,方便控制器生成Json数据进行传递 /// </summary> // [DataContract] [Serializable] public class EasyTreeData { /// <summary> /// ID /// </summary> //  [DataMember] public string id { get;

无限分级和tree结构数据增删改【提供Demo下载】

无限分级 很多时候我们不确定等级关系的层级,这个时候就需要用到无限分级了. 说到无限分级,又要扯到递归调用了.(据说频繁递归是很耗性能的),在此我们需要先设计好表机构,用来存储无限分级的数据.当然,以下都是自己捣鼓的结果,非标准.谁有更好的设计望不吝啬赐教. 说来其实也简单,就是一个ID和父ID的关系. 以此类推,Id需要是唯一的,ParenId需要是Id列里面存在即可.这样我们就实现无限分级了,如果再加一列Sort排序就更完美了. jstree插件 官方地址:https://www.jstre