使用jieba库进行分词
安装jieba就不说了,自行百度!
import jieba
将标题分词,并转为list
seg_list = list(jieba.cut(result.get("title"), cut_all=False))
所有标题使用空格连接,方便后面做自然语言处理
para = para + " ".join(seg_list)
将分词后的标题(使用空格分割的标题)放到一个list里面
summaryList.insert(0," ".join(seg_list))
统计词频
from nltk.tokenize import WordPunctTokenizer import nltk tokenizer = WordPunctTokenizer() #统计词频 sentences = tokenizer.tokenize(para)#此处将para转为list(16进制字符) wordFreq=nltk.FreqDist(sentences) for i in wordFreq:print i,wordFreq[i]
时间: 2024-10-10 15:17:44