一段nltk的代码,按照讲解用于在布朗语料库中分析情态动词在不同文体中出现的次数
ipython 运行,python版本3.5,代码如下
import nltk from nltk.corpus import brown cfd = nltk.ConditionalFreqDist( (genre,word) for genre in brown.categories() for word in brown.words(categories=genre) ) genres = [‘news‘, ‘religion‘, ‘hobbies‘, ‘science_fiction‘, ‘romance‘, ‘humor‘] modals = [‘can‘, ‘could‘, ‘may‘, ‘might‘, ‘must‘, ‘will‘] cfd.tabulate(conditions=genres, samples=modals) cfd.plot(conditions=genres, samples=modals)
然后故事开始发生了,报错
---------------------------------------------------------------------------
ValueError Traceback (most recent call last) <ipython-input-72-809f30b47486> in <module>() 8 # cfd.tabulate(conditions=genres,sample =modals) 9 cfd = nltk.ConditionalFreqDist( ---> 10 ((genre,word) for genre in brown.categories() for word in brown.words(categories = genre)) 11 # (genre,word) 12 # for genre in brown.categories() C:\Program Files\Python35\lib\site-packages\nltk\probability.py in __init__(self, cond_samples) 1751 defaultdict.__init__(self, FreqDist) 1752 if cond_samples: -> 1753 for (cond, sample) in cond_samples: 1754 self[cond][sample] += 1 1755 <ipython-input-72-809f30b47486> in <genexpr>(.0) 8 # cfd.tabulate(conditions=genres,sample =modals) 9 cfd = nltk.ConditionalFreqDist( ---> 10 ((genre,word) for genre in brown.categories() for word in brown.words(categories = genre)) 11 # (genre,word) 12 # for genre in brown.categories() C:\Program Files\Python35\lib\site-packages\nltk\corpus\reader\tagged.py in words(self, fileids, categories) 198 def words(self, fileids=None, categories=None): 199 return TaggedCorpusReader.words( --> 200 self, self._resolve(fileids, categories)) 201 def sents(self, fileids=None, categories=None): 202 return TaggedCorpusReader.sents( C:\Program Files\Python35\lib\site-packages\nltk\corpus\reader\tagged.py in words(self, fileids) 81 self._para_block_reader, 82 None) ---> 83 for (fileid, enc) in self.abspaths(fileids, True)]) 84 85 def sents(self, fileids=None): C:\Program Files\Python35\lib\site-packages\nltk\corpus\reader\util.py in concat(docs) 420 return docs[0] 421 if len(docs) == 0: --> 422 raise ValueError(‘concat() expects at least one object!‘) 423 424 types = set(d.__class__ for d in docs) ValueError: concat() expects at least one object!
提示说concat里面没对象,恩,然后我天真的以为是没有对象,开始查错,发现格式没有问题。
然后开始怀疑传参有问题,对照源码,参数也没有问题。
接着stackoverflow上,转了一圈也没有什么帮助。
然后我开始怀疑人生,对nltk产生了厌恶情绪,对讲义迟怀疑态度。
过了几分钟,冷静下来,又开始梳理代码,总共就几行,分布对参数1个个的进行print,
print ([genre for genre in brown.categories()) print (word for word in brown.words(categories="news"))
发现都有结果输出。
然后重新推到参数,重新测试
for genre in brown.categories(): for word in brown.words(categories=genre): print (word)
然后好了,发现ipython卡主了,删除print(word)发现停滞不能动了。感觉到了好像数据了有点大。好像是不是处理不过来了。
((genre,word) for genre in brown.categories() for word in brown.words(categories = genre))
换迭代器传参,也不行,还是报错。
无奈,悻悻的新建一个py文件,拷贝刚才的代码,准备单独运行,按下ctrl shift f10, 居然弹出图形窗口,列表居然也输出了。
can could may might must will news 93 86 66 38 50 389 religion 82 59 78 12 54 71 hobbies 268 58 131 22 83 264 science_fiction 16 49 4 12 8 16 romance 74 193 11 51 45 43 humor 16 30 8 8 9 13
黑人问号脸。。。
时间: 2024-10-22 15:54:29