英文字母频率统计

　　读取文件，统计各个字母出现的频率。

源程序：

  1 /**
  2  * Count the frequence of each character by reading a file.
  3  *
  4  * @author LuoPeng
  5  * @time 2015.3.5
  6  *
  7  */
  8 public class CharacterFrequence {
  9
 10     /**
 11      * Count the frequence of each character
 12      * @param filePath the path of the file to be read
 13      * @return the frequence of each character
 14      */
 15     public double[] countFrequence ( String filePath ) {
 16
 17         double [] frequence = null;
 18         int [] counts = null;
 19         long totalCharacter = 0L;
 20         BufferedReader br = null;
 21
 22         try {
 23             // Get the reader
 24             br = new BufferedReader(new InputStreamReader(new FileInputStream(filePath)));
 25             // Read message from the file if the reader is not null
 26             if ( br != null ) {
 27                 // Make and initialize the array
 28                 frequence = new double[26];
 29                 counts = new int[26];
 30
 31                 // Store the message of each line
 32                 String line = br.readLine();
 33                 // Store the length of line
 34                 int tempCount = 0;
 35                 // The loop variable
 36                 int i = 0;
 37                 while ( line != null ) {
 38                     line = line.trim().toLowerCase();
 39                     tempCount = line.length();
 40                     for ( i = 0; i < tempCount; i++ ) {
 41                         switch (line.charAt(i)) {
 42                         case ‘a‘:
 43                             counts[0]++;
 44                             break;
 45                         case ‘b‘:
 46                             counts[1]++;
 47                             break;
 48                         case ‘c‘:
 49                             counts[2]++;
 50                             break;
 51                         case ‘d‘:
 52                             counts[3]++;
 53                             break;
 54                         case ‘e‘:
 55                             counts[4]++;
 56                             break;
 57                         case ‘f‘:
 58                             counts[5]++;
 59                             break;
 60                         case ‘g‘:
 61                             counts[6]++;
 62                             break;
 63                         case ‘h‘:
 64                             counts[7]++;
 65                             break;
 66                         case ‘i‘:
 67                             counts[8]++;
 68                             break;
 69                         case ‘j‘:
 70                             counts[9]++;
 71                             break;
 72                         case ‘k‘:
 73                             counts[10]++;
 74                             break;
 75                         case ‘l‘:
 76                             counts[11]++;
 77                             break;
 78                         case ‘m‘:
 79                             counts[12]++;
 80                             break;
 81                         case ‘n‘:
 82                             counts[13]++;
 83                             break;
 84                         case ‘o‘:
 85                             counts[14]++;
 86                             break;
 87                         case ‘p‘:
 88                             counts[15]++;
 89                             break;
 90                         case ‘q‘:
 91                             counts[16]++;
 92                             break;
 93                         case ‘r‘:
 94                             counts[17]++;
 95                             break;
 96                         case ‘s‘:
 97                             counts[18]++;
 98                             break;
 99                         case ‘t‘:
100                             counts[19]++;
101                             break;
102                         case ‘u‘:
103                             counts[20]++;
104                             break;
105                         case ‘v‘:
106                             counts[21]++;
107                             break;
108                         case ‘w‘:
109                             counts[22]++;
110                             break;
111                         case ‘x‘:
112                             counts[23]++;
113                             break;
114                         case ‘y‘:
115                             counts[24]++;
116                             break;
117                         case ‘z‘:
118                             counts[25]++;
119                             break;
120                             default:
121                                 ;
122                         }
123                     }
124                     line = br.readLine();
125                 }
126
127                 // calculate the number of characters
128                 for ( i = 0; i < counts.length; i++ ) {
129                     totalCharacter += counts[i];
130                 }
131                 // calculate the frequence
132                 for ( i = 0; i < frequence.length; i++ ) {
133                     frequence[i] = 1.0*counts[i]/totalCharacter;
134                 }
135             }
136         } catch (FileNotFoundException e) {
137             e.printStackTrace();
138             System.out.println("Could not find the file...");
139         } catch (IOException e) {
140             e.printStackTrace();
141             System.out.println("Read message error...");
142         } finally {
143             // Close the IO
144             try {
145                 br.close();
146             } catch (IOException e) {
147                 e.printStackTrace();
148                 System.out.println("Close IO exception...");
149             }
150         }
151
152         return frequence;
153
154     }
155 }

统计结果：

　　上述两张图片的结果分别对应两篇文章《The Sorrows of Young Werther》和《Anna Karenina》，纵坐标表示出现的频率，横坐标对应各个字母（1对应a，2对应b，……，26对应z）。

时间： 2024-11-16 14:45:02

英文字母频率统计的相关文章

信息安全之程序实现简单替换加密，并用字母频率统计进行破解

1程序实现简单密码替换首先我们找一篇英文文章然后写程序简单替换,这里我们使用移位替换a移3位替换成d(key表示移位数) 读入文件函数测试加密System.out.println(encode(readfile("2.txt"),3)); 加密前加密后然后我们来破解我们知道英文中出现频率最高字母的是e字母,我们先测试下: 测试代码: 主函数输出:System.out.println(find(readfile("2.txt"))); 结果果然是e 现在我

文章字母频率统计

在听到这个任务的时候,脑子里一片空白,在与同学交流之后,也有了自己的一点思路,也了解了如何在Java中读取txt文件,但是在大小写转化的上还有问题 import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.InputStreamReader;import java.util.ArrayList;import java.util.Collections;import

编写一个程序，统计输入字符串中每一个小写英文字母出现的次数

import java.util.Scanner; /** * @author:(LiberHome) * @date:Created in 2019/3/1 22:18 * @description: * @version:$ */ /*编写一个程序,统计输入字符串中每一个小写英文字母出现的次数*/ public class page0901 { public static void main(String[] args) { /*首先,输入一段字符串作为字符数组*/ System.out.p

统计一段文字中数组、中文、英文字母、空格以及其他特殊字符出现的次数

package util; public class CountStr { /** * 有一个字符串,其中包含中文字符.英文字符和数字字符,请统计和打印出各个字符的个数 * 短信发送平台,短信字数控制查询方法 */ public static void main(String[] args) { //String str = "adasf AAADFD我是中文,,>123"; //String str = "金马甲高端商品交易平台--2013全城热恋克拉钻石项目预售,1

【笔试】7、统计出其中英文字母、空格、数字和其它字符的个数

/** * 题目:题目:输入一行字符,分别统计出其中英文字母.空格.数字和其它字符的个数. * 时间:2015年7月28日10:04:33 * 文件:lianxi07.java * 作者:cutter_point */ package bishi.zuixin50.t2015728; import java.io.BufferedReader; import java.io.File; import java.io.FileInputStream; import java.io.FileOutp

华为OJ——输入一行字符，分别统计出包含英文字母、空格、数字和其它字符的个数

题目描述输入一行字符,分别统计出包含英文字母.空格.数字和其它字符的个数. 输入描述: 输入一行字符串,可以有空格输出描述: 统计其中英文字符,空格字符,数字字符,其他字符的个数输入例子: 1qazxsw23 edcvfr45tgbn hy67uj m,ki89ol.\\/;p0-=\\][ 输出例子: 26 3 10 12 <span style="font-size:18px;">import java.util.*; public class Main { pu

OJ刷题之《统计出其中英文字母、数字、空格和其他字符的个数》

题目描述输入一行字符,分别统计出其中英文字母.数字.空格和其他字符的个数. 输入一行字符输出统计值样例输入 aklsjflj123 sadf918u324 asdf91u32oasdf/.';123 样例输出 23 16 2 4 代码如下: #include <iostream> #include <cstdio> using namespace std; int main() { char str[50]; int i=0,n1=0,n2=0,n3=0,n4=0; ge

使用R完成字符串的子字符串频率统计

整理自统计之都论坛方法一使用strsplit函数 a <- "aggcacggaaaaacgggaataacggaggaggacttggcacggcattacacggagg" b <- strsplit(as.character(a),"ag") length(b[[1]]) - 1 ##子字符串"ag"的出现个数方法二使用正则式函数 a <- "aggcacggaaaaacgggaataacggagg

在一段英文字母中找出每个字母重复数量的方法（Java）

首先需要了解下java的hashmap数据类型: hashmap是基于哈希表的Map接口的实现.hashmap有两个元素,一个是key(键名),一个是value(键值),就相当于一个字典了,和Python里的字典是一样的. 在一段英文字母中找出每个字母重复数量的实现原理: 将文字中出现的字母,作为键名(key),出现的次数作为键值(value),hashmap中的键名是不能重复的,那么统计这些字母的数量,就变成了统计这些相同键名的数量. 实现方式可以是从第一个字母开始,把字母存到哈希表中去,第一