[LeetCode] Word Frequency 单词频率

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

  • words.txt contains only lowercase characters and space ‘ ‘ characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.

For example, assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1

Note:
Don‘t worry about handling ties, it is guaranteed that each word‘s frequency count is unique.

[show hint]

Hint:
Could you write it in one-line using Unix pipes?

这道题给了我们一个文本文件,让我们统计里面单词出现的个数,提示中让我们用管道Pipes来做,在之前那道Tenth Line中,我们使用过管道命令,管道命令的讲解请参见这个帖子。提示中让我们用管道连接各种命令,然后一行搞定,那么我们先来看第一种解法,乍一看啥都不明白,咋办?没关系,容我慢慢来讲解。首先用的关键字是grep命令,该命令一种强大的文本搜索工具,它能使用正则表达式搜索文本,并把匹配的行打印出来,详解请参见这个帖子。后面紧跟的-oE ‘[a-z]+‘参数表示原文本内容变成一个单词一行的存储方式,于是此时文本的内容就变成了:

the
day
is
sunny
the
the
the
sunny
is

下面的sort命令就是用来排序的,参见这个帖子。排完序的结果为:

day
is
is
is
sunny
sunny
the
the
the
the

后面的uniq命令是表示去除重复行命令(参见这个帖子),后面的参数-c表示在每行前加上表示相应行目出现次数的前缀编号,得到结果如下:

1 day
   3 is
   2 sunny
   4 the

然后我们再sort一下,后面的参数-nr表示按数值进行降序排列,得到结果:

4 the
   3 is
   2 sunny
   1 day

而最后的awk命令就是将结果输出,两列颠倒位置即可:

解法一:

grep -oE ‘[a-z]+‘ words.txt | sort | uniq -c | sort -nr | awk ‘{print $2" "$1}‘ 

下面这种方法用的关键字是tr命令,该命令主要用来进行字符替换或者大小写替换,详解请参见这个帖子。后面紧跟的-s参数表示如果发现连续的字符,就把它们缩减为1个,而后面的‘ ’和‘\n‘就是空格和回车,意思是把所有的空格都换成回车,那么第一段命令tr -s ‘ ‘ ‘\n‘ < words.txt 就好理解了,将单词之间的空格都换成回车,跟上面的第一段实现的作用相同,后面就完全一样了,参见上面的讲解:

解法二:

tr -s ‘ ‘ ‘\n‘ < words.txt | sort | uniq -c | sort -nr | awk ‘{print $2, $1}‘

下面这种方法就没有用管道命令进行一行写法了,而是使用了强大的文本分析工具awk进行写类C的代码来实现,这种写法在之前的那道Transpose File已经讲解过了,这里就不多说了,最后要注意的是sort命令的参数-nr -k 2表示按第二列的降序数值排列:

解法三:

awk ‘{
    for (i = 1; i <= NF; ++i) ++s[$i];
} END {
    for (i in s) print i, s[i];
}‘ words.txt | sort -nr -k 2

参考资料:

https://leetcode.com/discuss/33353/my-accepted-answer-using-tr-sort-uniq-and-awk

https://leetcode.com/discuss/46976/my-ac-solution-one-pipe-command-but-cost-20ms

https://leetcode.com/discuss/29001/solution-using-awk-and-pipes-with-explaination

LeetCode All in One 题目讲解汇总(持续更新中...)

时间: 2024-10-09 23:14:47

[LeetCode] Word Frequency 单词频率的相关文章

[Leetcode] word search 单词查询

Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not be us

[LeetCode] Word Abbreviation 单词缩写

Given an array of n distinct non-empty strings, you need to generate minimal possible abbreviations for every word following rules below. Begin with the first character and then the number of characters abbreviated, which followed by the last charact

[LeetCode] Word Squares 单词平方

Given a set of words (without duplicates), find all word squares you can build from them. A sequence of words forms a valid word square if the kth row and column read the exact same string, where 0 ≤ k < max(numRows, numColumns). For example, the wor

LeetCode 0079. Word Search单词搜索【Python】

LeetCode 0079. Word Search单词搜索[Medium][Python][DFS] Problem LeetCode Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horiz

(每日算法)LeetCode --- Word Search(矩阵中查找单词)

在矩阵中查找给定的单词是否出现,可以向上.向下.向左和向右查找.不能在一个字母上重复查找. Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically n

192. Word Frequency

192. Word Frequency QuestionEditorial Solution My Submissions Total Accepted: 5272 Total Submissions: 20228 Difficulty: Medium Write a bash script to calculate the frequency of each word in a text file words.txt. For simplicity sake, you may assume:

[Bash]LeetCode192..统计词频 | Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt. For simplicity sake, you may assume: words.txt contains only lowercase characters and space ' ' characters. Each word must consist of lowercase characters only. Wor

[leetcode]Word Ladder II @ Python

[leetcode]Word Ladder II @ Python 原题地址:http://oj.leetcode.com/problems/word-ladder-ii/ 参考文献:http://blog.csdn.net/doc_sgl/article/details/13341405   http://chaoren.is-programmer.com/ 题意:给定start单词,end单词,以及一个dict字典.要求找出start到end的所有最短路径,路径上的每个单词都要出现在dict

LeetCode: Word Search [079]

[题目] Given a 2D board and a word, find if the word exists in the grid. The word can be constructed from letters of sequentially adjacent cell, where "adjacent" cells are those horizontally or vertically neighboring. The same letter cell may not