Write a bash script to calculate the frequency of each word in a text file words.txt
.
For simplicity sake, you may assume:
words.txt
contains only lowercase characters and space‘ ‘
characters.- Each word must consist of lowercase characters only.
- Words are separated by one or more whitespace characters.
Example:
Assume that words.txt
has the following content:
the day is sunny the the the sunny is is
Your script should output the following, sorted by descending frequency:
the 4 is 3 sunny 2 day 1
Note:
- Don‘t worry about handling ties, it is guaranteed that each word‘s frequency count is unique.
- Could you write it in one-line using Unix pipes?
写一个 bash 脚本以统计一个文本文件
words.txt
中每个单词出现的频率。
为了简单起见,你可以假设:
words.txt
只包括小写字母和‘ ‘
。- 每个单词只由小写字母组成。
- 单词间由一个或多个空格字符分隔。
示例:
假设 words.txt
内容如下:
the day is sunny the the the sunny is is
你的脚本应当输出(以词频降序排列):
the 4 is 3 sunny 2 day 1
说明:
- 不要担心词频相同的单词的排序问题,每个单词出现的频率都是唯一的。
- 你可以使用一行 Unix pipes 实现吗?
4ms
1 # Read from the file words.txt and output the word frequency list to stdout. 2 cat words.txt | tr -s ‘ ‘ ‘\n‘ | sort | uniq -c | sort -r | awk ‘{ print $2, $1 }‘
8ms
1 # Read from the file words.txt and output the word frequency list to stdout. 2 awk ‘{ 3 for (i = 1; i <= NF; ++i) ++s[$i]; 4 } END { 5 for (i in s) print i, s[i]; 6 }‘ words.txt | sort -nr -k 2
16ms
1 # Read from the file words.txt and output the word frequency list to stdout. 2 3 # try 1 4 sed ‘s/ \{1,\}/\n/g‘ words.txt | sed ‘/^$/d‘ | sort | uniq -c | sort -nr | awk ‘{print $2,$1}‘
原文地址:https://www.cnblogs.com/strengthen/p/10180228.html
时间: 2024-10-12 22:06:20