统计文档中前5个高频词个数并输出

import jieba

ls="中国是一个伟大的国家，是一个好的国家"
print(‘原始文档为：‘,ls)
counts={} # 定义统计字典
words=jieba.lcut(ls)
print(‘分好的词组为：‘,words)

for word in words:
    counts[word]=counts.get(word,0)+1
print(‘生成的字典为：‘,counts)
print(‘字典的元素为：‘,counts.items())
#字典元组转换为列表
items=list(counts.items())
print(‘counts的元素生成新的列表：‘,items)
#列表按第2个值进行排序-降序reverse=True，默认升序
items.sort(key=lambda x:x[1],reverse=True)

print(‘按元组中第二维值排序后的列表为：‘,items)
#转出列表前5个
for i in range(5):
    word,count=items[i]
    print("{0:<10}---{1:>5}".format(word,count))

#------------

for word in words:
    if len(word) ==1:   #增加一个判断是否为词组
        continue
    else:
        counts[word] = counts.get(word,0)+1

原文地址：https://www.cnblogs.com/huigebj/p/11433878.html

时间： 2024-10-07 19:32:37

统计文档中前5个高频词个数并输出的相关文章

C语言K&R习题系列——统计文档中每个单词所占字母个数，以直方图形式输出

原题: Write a program to print a histogram of the lengths of words in its input. It is easy to draw the histogram with the bars horizontal; a vertical orientation is more challenging. 这也是我第一个过百行的代码(带注释,空格什么的) 主要分两个部分:输入和输出 #include < stdio.h > #define

python统计文档中词频

python统计文档中词频的小程序 python版本2.7 程序如下,测试文件与完整程序在我的github中 1 #统计空格数与单词数本函数只返回了空格数需要的可以自己返回多个值 2 def count_space(path): 3 number_counts = 0 4 space_counts = 0 5 number_list = [] 6 7 with open(path, 'r') as f: 8 for line in f: 9 line = line.strip() 10 sp

指定文件目录遍历所有子目录统计文档的单词出现数量

package javaClassHomework; import java.io.BufferedReader; import java.io.File; import java.io.FileNotFoundException; import java.io.FileReader; import java.io.IOException; import java.text.DecimalFormat; import java.util.Comparator; import java.util.

MongoDB统计文档(Document)的数组(Array)中的各个元素出现的次数

一,问题描述 [使用 unwind unpack Document 里面的Array中的每个元素,然后使用 group 分组统计,最后使用 sort 对分组结果排序] 从 images.json 文件中导入数据到MongoDB服务器 mongoimport --drop -d test -c images images.json 其中Document的示例如下: > db.images.find() { "_id" : 3, "height" : 480, &

统计文档中单词出现频率

一.先贴出自己的代码 1 import java.io.BufferedReader; 2 import java.io.File; 3 import java.io.FileReader; 4 import java.io.IOException; 5 import java.util.Arrays; 6 import java.util.HashMap; 7 import java.util.Iterator; 8 import java.util.Map; 9 import java.ut

C++ code: 将程序的输出，保存到txt文档中，且每35个数，自动换行

// write the predicted score into txt files ofstream file("/home/wangxiao/Downloads/caffe-master/wangxiao/bvlc_alexnet/predict_score.txt",ios::app); if(!file) return; static int nu = 0; if(nu < 35){ file << b

【转】近200篇机器学习&深度学习资料分享（含各种文档，视频，源码等）

编者按:本文收集了百来篇关于机器学习和深度学习的资料,含各种文档,视频,源码等.而且原文也会不定期的更新,望看到文章的朋友能够学到更多. <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林.Deep Learning. <Deep Learning in Neural Networks: An Overview> 介绍:这是瑞士人工智能实验室 Ju

近200篇机器学习&深度学习资料分享（含各种文档，视频，源码等）(1)

原文:http://developer.51cto.com/art/201501/464174.htm 编者按:本文收集了百来篇关于机器学习和深度学习的资料,含各种文档,视频,源码等.而且原文也会不定期的更新,望看到文章的朋友能够学到更多. <Brief History of Machine Learning> 介绍:这是一篇介绍机器学习历史的文章,介绍很全面,从感知机.神经网络.决策树.SVM.Adaboost 到随机森林.Deep Learning. <Deep Learning i

全文搜索怎么给查询语句与文档相关性打分

朴素想法用户输入一个查询query,query由若干词(term)组成,文档也由若干词(term)组成.那么怎么评判查询和文档的相关性的高低. 很朴素简单的想法就是文档中包含的term与查询query中包含的term,两者越多相同的则说明越相关.比如query为"animal cat",文档一内容为"cat dog bird animal",文档二内容为"cat dog bird tiger",则认为query与文档二的相关性比文档一的高. 词