[Bash]LeetCode192..统计词频 | Word Frequency

Write a bash script to calculate the frequency of each word in a text file words.txt.

For simplicity sake, you may assume:

  • words.txt contains only lowercase characters and space ‘ ‘ characters.
  • Each word must consist of lowercase characters only.
  • Words are separated by one or more whitespace characters.


Assume that words.txt has the following content:

the day is sunny the the
the sunny is is

Your script should output the following, sorted by descending frequency:

the 4
is 3
sunny 2
day 1


  • Don‘t worry about handling ties, it is guaranteed that each word‘s frequency count is unique.
  • Could you write it in one-line using Unix pipes?

写一个 bash 脚本以统计一个文本文件 words.txt 中每个单词出现的频率。


  • words.txt只包括小写字母和 ‘ ‘ 。
  • 每个单词只由小写字母组成。
  • 单词间由一个或多个空格字符分隔。


假设 words.txt 内容如下:

the day is sunny the the
the sunny is is


the 4
is 3
sunny 2
day 1


  • 不要担心词频相同的单词的排序问题,每个单词出现的频率都是唯一的。
  • 你可以使用一行 Unix pipes 实现吗?


1 # Read from the file words.txt and output the word frequency list to stdout.
2 cat words.txt | tr -s ‘ ‘ ‘\n‘ | sort | uniq -c | sort -r | awk ‘{ print $2, $1 }‘


1 # Read from the file words.txt and output the word frequency list to stdout.
2 awk ‘{
3     for (i = 1; i <= NF; ++i) ++s[$i];
4 } END {
5     for (i in s) print i, s[i];
6 }‘ words.txt | sort -nr -k 2


1 # Read from the file words.txt and output the word frequency list to stdout.
3 # try 1
4 sed ‘s/ \{1,\}/\n/g‘ words.txt | sed ‘/^$/d‘ | sort | uniq -c | sort -nr | awk ‘{print $2,$1}‘


时间: 2024-07-30 02:41:14

