Penn Treebank

NLP中常用的PTB语料库，全名Penn Treebank。

Penn Treebank是一个项目的名称，项目目的是对语料进行标注，包括词性标注以及句法分析。

语料来源为：1989年华尔街日报

语料规模：1M words，2499篇文章

语料价格：$1700

Penn Treebank项目有两个发行版，Treebank-2与Treebank-3，委托Linguistic Data Consortium (LDC) 发行与收费。

这两个版本的语料内容是一样的，除了发行时间不清楚还有啥区别……

ref:

http://www.cis.upenn.edu/~treebank/

https://catalog.ldc.upenn.edu/LDC95T7

https://catalog.ldc.upenn.edu/LDC99T42

时间： 2024-10-14 09:11:57

Penn Treebank的相关文章

Penn Treebank Tags

Note: This information comes from "Bracketing Guidelines for Treebank II Style Penn Treebank Project" - part of the documentation that comes with the Penn Treebank. Contents: Bracket Labels Clause Level Phrase Level Word Level Function Tags Form

Machine and Deep Learning with Python

Machine and Deep Learning with Python Education Tutorials and courses Supervised learning superstitions cheat sheet Introduction to Deep Learning with Python How to implement a neural network How to build and run your first deep learning network Neur

Awesome Machine Learning

Awesome Machine Learning A curated list of awesome machine learning frameworks, libraries and software (by language). Inspired by awesome-php. If you want to contribute to this list (please do), send me a pull request or contact me @josephmisiti Als

awesome-nlp

awesome-nlp A curated list of resources dedicated to Natural Language Processing Maintainers - Keon Kim, Martin Park Please read the contribution guidelines before contributing. Please feel free to pull requests, or email Martin Park ([email protect

【NLP】

一语法解析语法的存储表达方式: 1 (S (NP (N Boeing)) (VP (V is) (VP (V located) (PP (P in) (NP (N Seattle)))))). 2 S代表句子 3 NP,VP,PP分别是名词短语,动词短语,介词短语 4 S,V,P分别是名,动,介词语法解析的算法: 如何表示一个句子中的语法,定义如下一些规则及变量 1)N表示一组非叶子节点的标注,例如{S.NP.VP.N...} 2)Σ表示一组叶子结点的标注,例如{boeing.is...}

训练分词模型

1. 训练的文件segmentor_train.txt 文件内容,用空格分隔词中国进出口银行与中国银行加强合作新华社北京十二月二十六日电 ( 记者周根良 ) 今日三大股指均小幅低开,随后沪深指数在权重板块集体拉升的带动下小幅上涨,但创业板却出现持续性的下跌. 午后权重跳水导致沪深指数也出现一波杀跌,创业板表现却迥异,盘中没有一波拉升,今日一度大跌 3%. 从盘面上看,今日权重板块依然

Open Data for Deep Learning

Open Data for Deep Learning Here you'll find an organized list of interesting, high-quality datasets for machine learning research. We welcome your contributions for curating this list! You can find other lists of such datasets on Wikipedia, for exam

深度学习与自然语言处理(5)_斯坦福cs224d 大作业测验2与解答

作业内容翻译:@胡杨([email protected]) && @面包君 && Fantzy同学校正与调整:寒小阳 && 龙心尘时间:2016年6月出处: http://blog.csdn.net/han_xiaoyang/article/details/51815683 http://blog.csdn.net/longxinchen_ml/article/details/51814343 说明:本文为斯坦福大学CS224d课程的中文版内容笔记,已

自然语言处理第二讲：单词计数

自然语言处理:单词计数这一讲主要内容(Today): 1.语料库及其性质: 2.Zipf 法则: 3.标注语料库例子: 4.分词算法: 一. 语料库及其性质: a) 什么是语料库(Corpora) i. 一个语料库就是一份自然发生的语言文本的载体,以机器可读形式存储: ii. 一种平衡语料库尝试在语言或者其他领域具有代表性: b) 译者注:平行语料库与平衡语料库的特点与区别 i. 平行语料库通常是由双语或多语的对应语料构成,常常是翻译文本构成.例如:Babel English-Chinese