统计文本中英文字母及英文单词的次数并排序

一、读取文本中英文字母出现的次数并降序输出英文字母的百分比

源码；

package total;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Arrays;

public class Statistics_letter {

    public static void main(String[] args) throws IOException  {
        // TODO Auto-generated method stub
        FileReader fr=new FileReader("a.txt");
        BufferedReader bufr=new BufferedReader(fr);
        StringBuffer str=new StringBuffer();
        String Line=null;
        while((Line=bufr.readLine())!=null) {
            str.append(Line);
        }
        bufr.close();

        double capitalletter[]=new double[26];
        double lowercaseletter[]=new double[26];
        int count=0;
        for(int i=0;i<str.length();i++) {
            char ch=str.charAt(i);
            if(ch>=‘A‘&&ch<=‘Z‘||ch>=‘a‘&&ch<=‘z‘) {
                for(int j=0;j<26;j++) {
                    if(ch==‘A‘+j)
                    capitalletter[j]++;
                }
                for(int k=0;k<26;k++) {
                    if(ch==‘a‘+k)
                        lowercaseletter[k]++;
                }
                count++;
            }
        }

        double percentage1[]=new double[52];
        double percentage2[]=new double[52];
        for(int i=0;i<26;i++) {
            percentage1[i]=capitalletter[i]/count;
            percentage2[i]=percentage1[i];
        }
        for(int i=26;i<52;i++) {
            percentage1[i]=lowercaseletter[i-26]/count;
            percentage2[i]=percentage1[i];
        }
        Arrays.sort(percentage1);
        for(int i=51;i>=0;i--) {
            int max=0;
            for(int j=0;j<52;j++) {
                if(percentage2[j]==percentage1[i])
                    max=j;
            }
            if(max>=26)
                System.out.print(((char)(‘a‘+max-26))+"：");
            else
            System.out.print(((char)(‘A‘+max))+"：");
            System.out.println(String.format("%.2f",percentage1[i]*100)+‘%‘);
        }
        System.out.println("英文字母总数为："+count);
    }
}

运行结果截图:

二、读取文本中的英文单词并按出现次数降序输出结果

源码：

package total;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.util.HashMap;
import java.util.Map;

public class Statistics_words {
    public Map<String, Integer> map1 = new HashMap<String, Integer>();

    public static void main(String arg[]) throws IOException {
        String sz[];
        Integer num[];
        final int MAXNUM = 20; 

        sz = new String[MAXNUM + 1];
        num = new Integer[MAXNUM + 1];
        Statistics_words Statistics_words = new Statistics_words();
        int account = 1;
        // Vector<String> ve1=new Vector<String>();
            Statistics_words.textImport();
        System.out.println("文本出现单词的次数情况为:");
        int g_run = 0;

        for (g_run = 0; g_run < MAXNUM + 1; g_run++) {
            account = 1;
            for (Map.Entry<String, Integer> it : Statistics_words.map1.entrySet()) {
                if (account == 1) {
                    sz[g_run] = it.getKey();
                    num[g_run] = it.getValue();
                    account = 2;
                }
                if (account == 0) {
                    account = 1;
                    continue;
                }
                if (num[g_run] < it.getValue()) {
                    sz[g_run] = it.getKey();
                    num[g_run] = it.getValue();
                }
                // System.out.println("英文单词: "+it.getKey()+" 该英文单词出现次数: "+it.getValue());
            }
            Statistics_words.map1.remove(sz[g_run]);
        }
        int g_count = 1;
        String tx1 = new String();
        for (int i = 0; i < g_run; i++) {
            if (sz[i] == null)
                continue;
            if (sz[i].equals(""))
                continue;
            tx1 += "出现次数第" + (g_count) + "多的单词为:" + sz[i] + "\t\t\t出现次数: " + num[i] + "\r\n";
            System.out.println("出现次数第" + (g_count) + "多的单词为:" + sz[i] + "\t\t\t出现次数: " + num[i]);
            g_count++;
        }
            Statistics_words.textExport(tx1);

    }

    public void textImport() throws IOException {

        File a = new File("C:\\Users\\22400\\Desktop\\a.txt");
        FileInputStream b = new FileInputStream(a);
        InputStreamReader c = new InputStreamReader(b, "UTF-8");
        String string2 = new String();
        while (c.ready()) {
            char string1 = (char) c.read();
            if (!isWord(string1)) {
                if (map1.containsKey(string2)) {
                    Integer num1 = map1.get(string2) + 1;
                    map1.put(string2, num1);
                } else {
                    Integer num1 = 1;
                    map1.put(string2, num1);
                }
                string2 = "";
            } else {
                string2 += string1;
            }
        }
        if (!string2.isEmpty()) {
            if (map1.containsKey(string2)) {
                Integer num1 = map1.get(string2) + 1;
                map1.put(string2, num1);
            } else {
                Integer num1 = 1;
                map1.put(string2, num1);
            }
            string2 = "";
        }
        c.close();
        b.close();
    }

    public void textExport(String txt) throws IOException {
        File fi = new File("StatisticsWord.txt");
        FileOutputStream fop = new FileOutputStream(fi);
        OutputStreamWriter ops = new OutputStreamWriter(fop, "UTF-8");
        ops.append(txt);
        ops.close();
        fop.close();
    }

    public boolean isWord(char a) {
        if (a <= ‘z‘ && a >= ‘a‘ || a <= ‘Z‘ && a >= ‘A‘)
            return true;
        return false;
    }

}

运行结果截图：

原文地址：https://www.cnblogs.com/weixiao1717/p/11795692.html

时间： 2024-11-02 09:22:56

统计文本中英文字母及英文单词的次数并排序的相关文章

用javaIO流读取文本中英文字母和英文单词的出现次数及频率

一.读取文本中英文字母出现的次数并降序输出英文字母的百分比源码: package total; import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.Arrays; public class Statistics_letter { public static void main(String[] args) throws IOException

java统计文本中某个字符串出现的次数

原文: java统计文本中某个字符串出现的次数源代码下载地址:http://www.zuidaima.com/share/1550463297014784.htm 统计文本中某个字符串出现的次数或字符串中指定元素出现的次数文件样本: 程序查找的上此文件带"a"的字符在多少次结果: package com.zuidaima.util.string; import java.io.*; /** * @author www.zuidaima.com **/ public class C

统计一个字符串中英文字母、空格、数字和其它字符的个数

1 package demo; 2 import java.util.Scanner; 3 /** 4 * 统计一个字符串中英文字母.空格.数字和其它字符的个数 5 */ 6 public class Statistics1 { 7 public static void main(String[]args){ 8 int i; 9 int LetterCount = 0; 10 int SpaceCount = 0; 11 int NumberCount = 0; 12 int OtherCou

统计输入任意的字符中中英文字母，空格和其他字符的个数 python

这里用到了三个函数: #判断是否为数字:str.isdigit()#是否为字母:str.isalpha()#是否为空格:str.isspace() def tongji(str): alpha = 0 number = 0 space =0 qt = 0 for i in range(len(str)): #或者for i in str: if str[i].isalpha(): #接上一句改为:i.isalpha() alpha += 1 elif str[i].isdigit(): numb

Hadoop:统计文本中单词熟练MapReduce程序

这是搭建hadoop环境后的第一个MapReduce程序: 基于python的脚本: 1 map.py文件,把文本的内容划分成单词: #!/bin/pythonimport sys for line in sys.stdin: data_list = line.strip().split() for i in range(0, len(data_list)): print data_list[i] 2 reduce文件,把统计单词出现的次数: #!/bi

统计每一个字母在一句话中出现的数量

/* *Copyright(c) 2014 烟台大学计算机学院 *All rights reserved. * Copyright (c) 2014, 烟台大学计算机学院 * All rights reserved. * 文件名称:test.cpp * 作者:杨汉宁 * 完成日期:2014年 12 月 8 日 * 版本号:v1.0 * * 问题描述:统计每一个字母出现的数量 * 输入描述:无 * 程序输出:每一个字母出现的数量 */ #include<iostream> using na

java怎么实现统计一个字符串中字符出现的次数

问题:假设字符串仅仅保护a-z 的字母,java怎么实现统计一个字符串中字符出现的次数?而且,如果压缩后的字符数不小于原始字符数,则返回. 处理逻辑:首先拆分字符串,以拆分出的字符为key,以字符出现次数为value,存入Map中. 源码如下: 1 import java.util.HashMap; 2 import java.util.Iterator; 3 import java.util.Map; 4 5 public class TestCompress { 6 7 public sta

用jieba库统计文本词频及云词图的生成

一.安装jieba库 :\>pip install jieba #或者 pip3 install jieba 二.jieba库解析 jieba库主要提供提供分词功能,可以辅助自定义分词词典. jieba库中包含的主要函数如下: jieba.cut(s) 精确模式,返回一个可迭代的数据类型 jieba.cut(s,cut_all=True)

拼接字符串；字符反转；统计大串中小串出现的次数

package Homework; import java.util.ArrayList;import java.util.Iterator;import java.util.List;import java.util.Scanner;/** * 把数组中的数据按照指定个格式拼接成一个字符串举例:int[] arr = {1,2,3}; 输出结果:[1, 2, 3] 字符串反转举例:键盘录入"abc" 输出结果:"cba" 统计大串中小串出现的次数举例:在字符串&q