[Bash]LeetCode192..统计词频 | Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

  • words.txt contains only lowercase characters and space ‘ ‘ characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.

Example:

Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

Note:

  • Don‘t worry about handling ties, it is guaranteed that each word‘s frequency count is unique.
  • Could you write it in one-line using Unix pipes?


写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见,你可以假设:

  • words.txt只包括小写字母和 ‘ ‘ 。
  • 每个单词只由小写字母组成。
  • 单词间由一个或多个空格字符分隔。

示例:

假设 words.txt 内容如下:

the day is sunny the the
the sunny is is

你的脚本应当输出(以词频降序排列):

the 4
is 3
sunny 2
day 1

说明:

  • 不要担心词频相同的单词的排序问题,每个单词出现的频率都是唯一的。
  • 你可以使用一行 Unix pipes 实现吗?


4ms

1 # Read from the file words.txt and output the word frequency list to stdout.
2 cat words.txt | tr -s ‘ ‘ ‘\n‘ | sort | uniq -c | sort -r | awk ‘{ print $2, $1 }‘


8ms

1 # Read from the file words.txt and output the word frequency list to stdout.
2 awk ‘{
3     for (i = 1; i <= NF; ++i) ++s[$i];
4 } END {
5     for (i in s) print i, s[i];
6 }‘ words.txt | sort -nr -k 2


16ms

1 # Read from the file words.txt and output the word frequency list to stdout.
2
3 # try 1
4 sed ‘s/ \{1,\}/\n/g‘ words.txt | sed ‘/^$/d‘ | sort | uniq -c | sort -nr | awk ‘{print $2,$1}‘

原文地址:https://www.cnblogs.com/strengthen/p/10180228.html

时间: 2024-07-30 02:41:14

[Bash]LeetCode192..统计词频 | Word Frequency的相关文章

C++回顾 统计词频问题 -- vector、map、hash_map(三种方式时间比较)

本博文我们通过三个程序比较统计词频问题的时间复杂度问题: 问题描述; 1).找一篇文章,将所有单词输入至程序:(The Bible Holy为例) 2).统计出每个单词的数量,即词频问题: 3).增加停用词功能:(遇到此类词,直接略过)(网上搜) 4).分别统计出读取文件并计算词频时间.排序所用时间: 5).用 类 实现各函数(处统计时间的函数除外). vector.map.hash_map 都要处理字符串的 去除标点符号.将大写字母转换成小写字母.不对数字进行统计 问题.因此,我们可以将处理这

192. Word Frequency

192. Word Frequency QuestionEditorial Solution My Submissions Total Accepted: 5272 Total Submissions: 20228 Difficulty: Medium Write a bash script to calculate the frequency of each word in a text file words.txt. For simplicity sake, you may assume:

[LeetCode] Word Frequency 单词频率

Write a bash script to calculate the frequency of each word in a text file words.txt. For simplicity sake, you may assume: words.txt contains only lowercase characters and space ' ' characters. Each word must consist of lowercase characters only. Wor

Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt. For simplicity sake, you may assume: words.txt contains only lowercase characters and space ' ' characters. Each word must consist of lowercase characters only. Wor

Word frequency analysis

Write a program that reads a file, breaks each line into words, scripts whitespace and punctuation from the words, and converts them to lowercase. Modify the program to print the 20 most frequently-used words in the book. First I downloaded the e-boo

Excel中COUNTIFS函数统计词频个数出现次数

Excel中COUNTIFS函数统计词频个数出现次数 在Excel中经常需要实现如下需求:在某一列单元格中有不同的词语,有些词语相同,有的不同(如图1所示).需要统计Excel表格中每个词语出现的个数,即相当于统计词频出现次数. 图1. Excel表格统计个数 解决方法:采用COUNTIFS函数. COUNTIFS 函数语法及格式:COUNTIFS(criteria_range1, criteria1, [criteria_range2, criteria2]…)其中,criteria_rang

自然语言理解 之 统计词频

统计词频,中文字体编码格式:GB2312. 1 #include <iostream> 2 #include <fstream> 3 #include <algorithm> 4 #include <functional> 5 #include <string> 6 #include <vector> 7 #include <map> 8 #include <unordered_map> 9 #include

[学习记录]NLTK常见操作一(去网页标记,统计词频,去停用词)

NLTK是python环境中的一个非常流行的NLP库,这篇记录主要记录NLTK的一些常见操作 1.去除网页html标记 我们常常通过爬虫获取网页信息,然后需要去除网页的html标签.为此我们可以这么做: 2.统计词频 这里使用的tokens就是上面图中的tokens 3.去除停用词 停用词就是类似the,a,of这种语义无价值的词,取出后我们还可以把统计图画出来 4.绘制词云图 对于词云图的使用原理还不太清楚,只是找了一个可运行的公式 原文地址:https://www.cnblogs.com/t

python进行分词及统计词频

#!/usr/bin/python # -*- coding: UTF-8 -*- #分词统计词频 import jieba import re from collections import Counter content="" filename=r"../data/commentText.txt"; result = "result_com.txt" r='[0-9\s+\.\!\/_,$%^*()?;::-[]+\"\']+|[+