HDU 1053 Entropy(哈夫曼编码 贪心+优先队列)

传送门:

http://acm.hdu.edu.cn/showproblem.php?pid=1053

Entropy

Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 7233    Accepted Submission(s): 3047

Problem Description

An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with “wasted” or “extra” information removed. In other words, entropy encoding removes information that was not necessary in the first place to accurately encode the message. A high degree of entropy implies a message with a great deal of wasted information; english text encoded in ASCII is an example of a message type that has very high entropy. Already compressed messages, such as JPEG graphics or ZIP archives, have very little entropy and do not benefit from further attempts at entropy encoding.

English text encoded in ASCII has a high degree of entropy because all characters are encoded using the same number of bits, eight. It is a known fact that the letters E, L, N, R, S and T occur at a considerably higher frequency than do most other letters in english text. If a way could be found to encode just these letters with four bits, then the new encoding would be smaller, would contain all the original information, and would have less entropy. ASCII uses a fixed number of bits for a reason, however: it’s easy, since one is always dealing with a fixed number of bits to represent each possible glyph or character. How would an encoding scheme that used four bits for the above letters be able to distinguish between the four-bit codes and eight-bit codes? This seemingly difficult problem is solved using what is known as a “prefix-free variable-length” encoding.

In such an encoding, any number of bits can be used to represent any glyph, and glyphs not present in the message are simply not encoded. However, in order to be able to recover the information, no bit pattern that encodes a glyph is allowed to be the prefix of any other encoding bit pattern. This allows the encoded bitstream to be read bit by bit, and whenever a set of bits is encountered that represents a glyph, that glyph can be decoded. If the prefix-free constraint was not enforced, then such a decoding would be impossible.

Consider the text “AAAAABCD”. Using ASCII, encoding this would require 64 bits. If, instead, we encode “A” with the bit pattern “00”, “B” with “01”, “C” with “10”, and “D” with “11” then we can encode this text in only 16 bits; the resulting bit pattern would be “0000000000011011”. This is still a fixed-length encoding, however; we’re using two bits per glyph instead of eight. Since the glyph “A” occurs with greater frequency, could we do better by encoding it with fewer bits? In fact we can, but in order to maintain a prefix-free encoding, some of the other bit patterns will become longer than two bits. An optimal encoding is to encode “A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”. (This is clearly not the only optimal encoding, as it is obvious that the encodings for B, C and D could be interchanged freely for any given encoding without increasing the size of the final encoded message.) Using this encoding, the message encodes in only 13 bits to “0000010110111”, a compression ratio of 4.9 to 1 (that is, each bit in the final encoded message represents as much information as did 4.9 bits in the original encoding). Read through this bit pattern from left to right and you’ll see that the prefix-free encoding makes it simple to decode this into the original text even though the codes have varying bit lengths.

As a second example, consider the text “THE CAT IN THE HAT”. In this text, the letter “T” and the space character both occur with the highest frequency, so they will clearly have the shortest encoding bit patterns in an optimal encoding. The letters “C”, “I’ and “N” only occur once, however, so they will have the longest codes.

There are many possible sets of prefix-free variable-length bit patterns that would yield the optimal encoding, that is, that would allow the text to be encoded in the fewest number of bits. One such optimal encoding is to encode spaces with “00”, “A” with “100”, “C” with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N” with “1011” and “T” with “01”. The optimal encoding therefore requires only 51 bits compared to the 144 that would be necessary to encode the message with 8-bit ASCII encoding, a compression ratio of 2.8 to 1.

Input

The input file will contain a list of text strings, one per line. The text strings will consist only of uppercase alphanumeric characters and underscores (which are used in place of spaces). The end of the input will be signalled by a line containing only the word “END” as the text string. This line should not be processed.

Output

For each text string in the input, output the length in bits of the 8-bit ASCII encoding, the length in bits of an optimal prefix-free variable-length encoding, and the compression ratio accurate to one decimal point.

Sample Input

AAAAABCD
THE_CAT_IN_THE_HAT
END

Sample Output

64 13 4.9
144 51 2.8

Source

Greater New York 2000

Recommend

We have carefully selected several similar problems for you:  1051 1054 1052 3177 1055

题意是,给出一排字符串,要求求出字符的8位编码的长度,哈夫曼编码值,以及之间的比值

因为仅仅只要求求出哈夫曼编码值,所以不用建立哈夫曼树,可以建立优先队列,只要将每次最小的

出队的两个元素合成一个新的大数,然后放进优先队列中,直到只剩下一个元素为止,那个元素就是哈夫曼编码值。

注意只有一种字符的情况

code:

#include<bits/stdc++.h>
using namespace std;
int main()
{
    string str;
    while(cin>>str)
    {
        if(str=="END")
            break;
        int l=str.length();
        int a[27]={0};
        for(int i=0;i<l;i++)
        {
            if(str[i]==‘_‘)
            {
                a[0]++;
            }else
            {
                a[str[i]-‘A‘+1]++;//字符统计
            }
        }
        int f=0;
        for(int i=0;i<27;i++)//字符串单一字符情况
        {
            if(a[i]==l)
            {
                f=1;
                break;
            }
        }
        if(f==1)
        {
            printf("%d %d 8.0\n",l*8,l);
            continue;
        }
        //每次选择两个出现频率高的合成一共新的结点,然后再压入,直到队列力只有一个元素
        priority_queue<int,vector<int>,greater<int> > q;//优先队列实现哈夫曼编码总权值
        for(int i=0;i<27;i++)
        {
            if(a[i]!=0)
                q.push(a[i]);//压入
        }
        int ans=0;
        int x,y;
        while(1)
        {
            x=q.top(),q.pop();
            if(q.empty())
                break;
            y=q.top(),q.pop();
            ans+=x+y;
            q.push(x+y);
        }
        printf("%d %d %0.1lf\n",l*8,ans,double(l*8.0/(ans*1.0)));
    }
    return 0;
}

原文地址:https://www.cnblogs.com/yinbiao/p/9397260.html

时间: 2024-10-07 21:04:58

HDU 1053 Entropy(哈夫曼编码 贪心+优先队列)的相关文章

hdoj 1053 Entropy(用哈夫曼编码)优先队列

题目链接:http://acm.hdu.edu.cn/showproblem.php?pid=1053 讲解: 题意:给定一个字符串,根据哈夫曼编码求出最短长度,并求出比值. 思路:就是哈夫曼编码.把单个字符出现次数作为权值. AC代码: 1 #include <iostream> 2 #include <string> 3 #include <queue> 4 #include <cstdio> 5 using namespace std; 6 7 cla

哈夫曼编码--贪心策略

哈夫曼编码还是在暑假时候看的,那时候并没有看懂因为比较菜(虽然现在也是很菜的),在<趣学算法>一书中这个问题讲解十分到位,我这篇博客真的是难以望其项背,只能对其进行一点借鉴和摘抄吧 哈夫曼编码是一棵树,权值越大的节点越靠近树根,越小的节点就越远离树根,从他的定义来看,首先想到的应该是贪心策略吧,没错就是贪心算法 虽然说是贪心算法,但是还要知道它 的实现方式啊,他的贪心策略是:每次从树的集合中取出没有双亲且权值最小的两棵树作为左右子树,并合并他们 步骤 : 1 :确定合适的数据结构(要知道他的左

机智零崎不会没梗Ⅱ (哈夫曼编码、优先队列)

题目描述 你满心欢喜的召唤出了外星生物,以为可以变身超人拥有强大力量战胜一切怪兽,然而面对着身前高大的外星生物你一脸茫然,因为,你懂M78星云语吗?不过不用担心,因为零崎非常机智,他给出了关键性的提示:“讲道理,日语可是全宇宙通用语,所以为什么不试试和外星人讲日语呢?” 不过现在外星生物说的话都是“[email protected]#$%^&%#%I&!……”这样的东西,你要怎么转换成日语呢? 作位全宇宙通用的日语,自然有一套万能的转换算法,那就是Huffman编码转换!当然了这肯定不是普

Entropy(哈弗曼编码,优先队列)

<div class="plm" style="text-align: center; font-size: 12pt; color: rgb(34, 34, 34); clear: both;"><div class="ptt" id="problem_title" style="font-size: 18pt; font-weight: bold; color: blue; padding: 1

HDU 1053 &amp; HDU 2527 哈夫曼编码

http://acm.hdu.edu.cn/showproblem.php?pid=1053 #include <iostream> #include <cstdio> #include <algorithm> #include <cstring> #include <queue> using namespace std; int a[30]; char s[1005]; int cal(char x){ if(x == '_') return

贪心算法-霍夫曼编码

霍夫曼编码是一种无损数据压缩算法.在计算机数据处理中,霍夫曼编码使用变长编码表对源符号(如文件中的一个字母)进行编码,其中变长编码表是通过一种评估来源符号出现机率的方法得到的,出现机率高的字母使用较短的编码,反之出现机率低的则使用较长的编码,这便使编码之后的字符串的平均长度.期望值降低,从而达到无损压缩数据的目的.例如,在英文中,e的出现机率最高,而z的出现概率则最低.当利用霍夫曼编码对一篇英文进行压缩时,e极有可能用一个比特来表示,而z则可能花去25个比特(不是26).用普通的表示方法时,每个

贪心算法应用-哈夫曼编码

哈夫曼编码应用于数据文件和图像压缩的编码方式.其压缩率通常在20%~90%之间.在进行远距离通信时,通常需要把将要传送的文字转换为由二进制字符组成的字符串,并使要传送的电文总长度尽可能的短.显然只要将点文章出现次数多的字符采用尽可能短的编码,就可以减少要传送的电文总长度. 哈夫曼编码的核心思想: (1)每一个字符用一个0,1串作为其代码,并要求任意一个字符的代码都不是其他字符代码的前缀. (2)用字符在文件中出现的频率来建立一个用0,1串表示各字符的最优表示方式,即使出现频率高的字符获得较短编码

Hdu 1053 Entropy

Entropy Time Limit: 2000/1000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)Total Submission(s): 4171    Accepted Submission(s): 1703 Problem Description An entropy encoder is a data encoding method that achieves lossless data compress

[2016-02-04][HDU][1053][Entropy]

[2016-02-04][HDU][1053][Entropy] Entropy Time Limit: 1000MS Memory Limit: 32768KB 64bit IO Format: %I64d & %I64u Submit Status Description An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with