关于HuffmanCoding的简单分析

1.what‘s problem we faced?

/**

*    Q: what‘s problem we faced?

*

*    A: Data compression is still a problem, even now. we want to compress

*        the space of data. This desire is more and more stronger when we

*        need to deal with some operation about data transmission. Before

*        we start this article, it may be helpful if you try to provide a valid way

*        to compress data . I tried, but failed obviously. That why I write this

*        article. ^_^

*/

2. How can I solve it?

/**

*    Q: How can I solve it?

*

*    A: Where have problem is where have an answer, although it not always

*        the best one. In 1951, a algorithm was introduced by David A. Huffman.

*        It is different from the normal code and is a variable length code, which

*        have different length of code for different symbol. Now, there are two

*        problems:

*

*        No.1: is  variable length code possible? How can we know the length

*                of current symbol?

*

*                The answer is prefix code. Think about this, a tree like following:

*

*

*                                         O

*                                   1 /     \ 0

*                                    O       O

*                               1 /    \ 0   c

*                                O      O

*                                a       b

*

*                This is a simple binary tree. There are three leaf node: a, b ,and c.we

*                label all of left branch as 1, and all of right branch as 0. So if we want

*                to arrive the leaf node a, the path is 11. In a similar way, we can get

*                all of nodes:

*                        a : 11

*                        b : 10

*                        c : 0

*

*                By accident, we get a variable length code.

*

*

*        No.2: How can we use variable length code to compress a series of symbol?

*

*                Now that we have a ability about variable length code. Some funny thing

*                will happen. Image this, In a data, which consist of a series of symbols,

*                some of symbols have occur at high proportion. some of symbols has occur

*                at low proportion. If we use some shorter code to indicate those symbols

*                which have a high proportion, the space of data will smaller than ever.

*                That is what we want.

*

*        Now, we have been know that we could compress a data by use variable length

*        code. However, the next problem is what kind of variable length code is what we

*        want. what kind of code is optimal ?

*/

3. What is HuffmanCoding ?

/**

*    Q: What is HuffmanCoding ?

*

*    A:Now,the problem is how can I create a optimal tree ? Do you have any idea?

*        Huffman was introduced a algorithm. It is looks like greedy algorithm. It is may

*        be simple, but the result is valid( this will be demonstrated below). The simplest

*        construction algorithm use a priority queue where the node with lowest probability

*        is given highest priority, the steps as following:

*

*        1. create a leaf node for each symbol, and add it to the priority queue.

*        2. while there is more than one node in the queue:

*            1. remove two nodes that have the highest priority.

*            2. create a new node as the parent node of the two nodes above. the

*                probability of this one is equal to the sum of the two nodes‘ probabilities.

*            3. add the new node to the queue.

*        3. the remaining node is the root of this tree. Read it‘s code as we do above.

*

*/

4. is it optimal ?

/**

*    Q: is it optimal ?

*

*    A: Hard to say. I haven‘t a valid method to measure this. About this issue, it is necessary to hear

*        about other people‘s advice. I believe there must be some exciting advice. By the way, this article

*        is just talk about compress of independent symbol, another important issue is about related symbol.

*        That maybe a serious problem.

*

*/

5. source code

/**
*    Here is an simple example
*/

#include <stdio.h>
#include <iostream>

/**
*    In a Huffman tree, some of nodes is valid symbol, and other is a combine node, which
*    haven't a valid symbol. we need to label it in our nodes.
*/
enum ELEM_TYPE {
        ET_VALID,
        ET_INVALID,
        ET_MAX,
};

typedef int    INDEX;

/**
*    this is a container, we push all of element to it, and pop element by a priority. It is
*    a class template since we don't know the type of data element.
*/
template <class ELEM>
class Container {
        public:
                Container( int capacity);
                ~Container( );
                /*
            *    push a element to this container.
            */
                bool push( ELEM item);
                /*
            *    pop a element from this container, the smallest one have the most priority.
            *    Of course, the element must have provide a reload function for operator '<'.
            */
                bool pop( ELEM &item );

        private:
                bool _find_idle( INDEX &num);
                bool _set_elem( INDEX num, ELEM &elem);
                bool _get_elem( INDEX num, ELEM &elem);

                ELEM                *ele;
                ELEM_TYPE    *stat;
                int                        cap;
};

template <class ELEM>
Container<ELEM>::Container(  int capacity)
{
        this->ele = new ELEM[capacity] ;
        this->stat = new ELEM_TYPE[capacity];

        int        i;
        for( i=0; i<capacity; i++)
                this->stat[i] = ET_INVALID;

        this->cap = capacity ;
}

template <class ELEM>
Container<ELEM>::~Container(  )
{
        if( this->ele!=NULL )
                delete []this->ele;

        if( this->stat!=NULL )
                delete []this->stat;

        this->cap = 0;
}

template <class ELEM>
bool Container<ELEM>::push( ELEM item)
{
        INDEX        num = -1;

        if( (!this->_find_idle( num))
                ||(!this->_set_elem( num, item)))
                return false;

        return true;
}

template <class ELEM>
bool Container<ELEM>::pop( ELEM &item )
{
        INDEX    i = 0;
        INDEX    Min;

        /*
       *    find the first valid element.
       */
        while( (this->stat[i]!=ET_VALID)
                        &&( i<this->cap))
                            i++;

        for( Min = i ; i<this->cap; i++)
        {
                if(  ( this->stat[i]==ET_VALID)
                      &&( this->ele[i]<this->ele[Min]))
                    {
                            Min = i;
                    }
        }

        return this->_get_elem( Min, item);
}

template <class ELEM>
bool Container<ELEM>::_find_idle( INDEX &num)
{
        INDEX        i;
        for( i=0; i<this->cap; i++)
        {
                if( this->stat[i]==ET_INVALID )
                {
                        num = i;
                        return true;
                }
        }

        return false;
}

template <class ELEM>
bool Container<ELEM>::_set_elem( INDEX num, ELEM &elem)
{
        if( (num>=this->cap)
                ||(num<0) )
                    return false;

        this->stat[num] = ET_VALID;
        this->ele[num] = elem;

        return true;
}

template <class ELEM>
bool Container<ELEM>::_get_elem( INDEX num, ELEM &elem)
{
        if( (num<0)
                ||(num>=this->cap))
                    return false;

        this->stat[num] = ET_INVALID;
        elem =  this->ele[num];

        return true;
}

/**
*    define a type of symbol. It will be used to record all information about a symbol.
*/
typedef char SYMINDEX;
typedef int SYMFRE;

class Symbol {
        public:
                /*
            *    In the Huffman tree, we need to compute the sum of two child symbol.
            *    For convenience,build a reload function is necessary.
            */
                Symbol operator + ( Symbol &s);
                SYMINDEX        sym;
                SYMFRE            freq;
};

Symbol Symbol::operator +( Symbol &s)
{
        Symbol        ret;
        ret.sym = '\0';
        ret.freq = this->freq + s.freq;
        return ret;
}

/**
*    define a node of binary tree. It will be used to create a Huffman tree.
*/
class HTreeNode {
        public:
                /*
            *    In the container, we need compare two nodes. So this node must
            *    provide a reload function about '<'.
            */
                bool operator< ( HTreeNode &n);

                HTreeNode        *lchild;
                HTreeNode        *rchild;
                Symbol                sym;
};

bool HTreeNode::operator < ( HTreeNode &n)
{

        return this->sym.freq<n.sym.freq? true: false;
}

/**
*    This is the core structure. It will build a Huffman coding based on our input symbol.
*/
class HuffmanCoding {
        public:
                HuffmanCoding( );
                ~HuffmanCoding( );
                bool Set( Symbol s[], int num);
                bool Work( void);

	private:
                /*
            *    create a Huffman tree.
            */
                bool CreateTree(Symbol s[], int num );
                bool DestroyTree( );
                /*
            *    read Huffman coding from a Huffman tree.
            */
                bool ReadCoding( );
                bool TravelTree( HTreeNode *parent, char *buf, INDEX cur);

                Symbol                *sym ;
                int                        sym_num ;
                HTreeNode        *root ;
};

HuffmanCoding::HuffmanCoding( )
{
        this->sym = NULL;
        this->sym_num = 0;
        this->root = NULL;
}

HuffmanCoding::~HuffmanCoding( )
{
        if( this->sym!=NULL)
                delete []this->sym;

        this->sym_num = 0;
        this->DestroyTree( );
}

/**
*    receive data from outside. Actually, this function is not necessary.But for make the
*    algorithm looks like more concise,maybe this function is  necessary.
*/
bool HuffmanCoding::Set( Symbol s [ ], int num)
{
        this->DestroyTree( );

        this->sym = new Symbol[num];
        for( int i=0; i<num; i++)
                this->sym[i] = s[i];

        if( NULL!=this->sym)
        {
                this->sym_num = num;
                return true;
        }
        else
        {
                this->sym_num = 0;
                return false;
        }
}
/**
*    The core function. In this function, we create a Huffman tree , then read it.
*/
bool HuffmanCoding::Work( void)
{

        //Create a Huffman tree
        if( !this->CreateTree( this->sym, this->sym_num))
                return false;
        //read Huffman coding
        if( !this->ReadCoding( ))
                return false;

        return true;
}

bool HuffmanCoding::CreateTree( Symbol s[], int num)
{
         /*
       *    create a priority tank. It always pop the element of the highest priority in the tank.
       */
        Container<HTreeNode>	tank(num);
        for( int i=0; i<this->sym_num; i++)
        {
                HTreeNode        node;
                node.lchild = NULL;
                node.rchild = NULL;
                node.sym = s[i];
                tank.push( node);
        }
        /*
       *    always pop two nodes, if fail, that's means there is only one node remain and it
       *    is the root node of this Huffman tree.
       */
        HTreeNode        node1;
        HTreeNode        node2;
        while(  tank.pop( node1)
                        && tank.pop( node2) )
        {
                HTreeNode        parent;
                parent.lchild = new HTreeNode;
                parent.rchild = new HTreeNode;
                *parent.lchild = node1;
                *parent.rchild = node2;
                parent.sym = node1.sym + node2.sym;
                /*
            *    push new node to the tank.
            */
                tank.push( parent);
        }

        this->root = new HTreeNode(node1);

        return true;
}

bool HuffmanCoding::DestroyTree( )
{

        return false;
}

bool HuffmanCoding::ReadCoding( )
{
        char        *code;
        code = new char[this->sym_num + 1];
        /*
       *    travel the Huffman tree and print the code of all valid symbols.
       */
        this->TravelTree( this->root, code, 0);

        delete []code;

        return true;
}

#define        LCHAR    '1'
#define        RCHAR    '0'

bool HuffmanCoding::TravelTree( HTreeNode *parent, char *buf, INDEX cur)
{
        buf[cur] = '\0';
        if( (parent->lchild==NULL)
                &&(parent->rchild==NULL) )
                {//end node
                        printf("[ %c] : %s\n", parent->sym.sym, buf);
	        }

        if( parent->lchild!=NULL )
        {
                buf[cur] = LCHAR;
                this->TravelTree( parent->lchild, buf, cur + 1);
        }

        if( parent->rchild!=NULL )
        {
                buf[cur] = RCHAR;
                this->TravelTree( parent->rchild, buf, cur + 1);
        }

        return true;
}

static Symbol	sArr[ ] = {
        { '0', 0},
        { '1', 1},
        { '2', 2},
        { '3', 3},
        { '4', 4},
        { '5', 5},
        { '6', 6},
        { '7', 7},
        { '8', 8},
        { '9', 9},
};

int main()
{
        HuffmanCoding    hcoding;
        hcoding.Set( sArr, 10);
        hcoding.Work( );

        return 0;
}

关于HuffmanCoding的简单分析,布布扣,bubuko.com

时间: 2024-10-10 05:45:41

关于HuffmanCoding的简单分析的相关文章

FFmpeg源代码简单分析:avformat_alloc_output_context2()

本文简单分析FFmpeg中常用的一个函数:avformat_alloc_output_context2().在基于FFmpeg的视音频编码器程序中,该函数通常是第一个调用的函数(除了组件注册函数av_register_all()).avformat_alloc_output_context2()函数可以初始化一个用于输出的AVFormatContext结构体.它的声明位于libavformat\avformat.h,如下所示. /** * Allocate an AVFormatContext

实时计算,流数据处理系统简介与简单分析

转自:http://www.csdn.net/article/2014-06-12/2820196-Storm 摘要:实时计算一般都是针对海量数据进行的,一般要求为秒级.实时计算主要分为两块:数据的实时入库.数据的实时计算.今天这篇文章详细介绍了实时计算,流数据处理系统简介与简单分析. 编者按:互联网领域的实时计算一般都是针对海量数据进行的,除了像非实时计算的需求(如计算结果准确)以外,实时计算最重要的一个需求是能够实时响应计算结果,一般要求为秒级.实时计算的今天,业界都没有一个准确的定义,什么

java基础----&gt;hashSet的简单分析(一)

对于HashSet而言,它是基于HashMap实现的,底层采用HashMap来保存元素的.今天我们就简单的分析一下它的实现. HashSet的简单分析 一.hashSet的成员变量组成 public class HashSet<E> extends AbstractSet<E> implements Set<E>, Cloneable, java.io.Serializable private transient HashMap<E,Object> map;

FFmpeg的HEVC解码器源代码简单分析:解析器(Parser)部分

上篇文章概述了FFmpeg中HEVC(H.265)解码器的结构:从这篇文章开始,具体研究HEVC解码器的源代码.本文分析HEVC解码器中解析器(Parser)部分的源代码.这部分的代码用于分割HEVC的NALU,并且解析SPS.PPS.SEI等信息.解析HEVC码流(对应AVCodecParser结构体中的函数)和解码HEVC码流(对应AVCodec结构体中的函数)的时候都会调用该部分的代码完成相应的功能. 函数调用关系图 FFmpeg HEVC解析器(Parser)部分在整个HEVC解码器中的

x264源代码简单分析:熵编码(Entropy Encoding)部分

本文记录x264的 x264_slice_write()函数中调用的x264_macroblock_write_cavlc()的源代码.x264_macroblock_write_cavlc()对应着x264中的熵编码模块.熵编码模块主要完成了编码数据输出的功能. 函数调用关系图 熵编码(Entropy Encoding)部分的源代码在整个x264中的位置如下图所示. 单击查看更清晰的图片 熵编码(Entropy Encoding)部分的函数调用关系如下图所示.   单击查看更清晰的图片 从图中

u-boot分析(十一)----MMU简单分析|u-boot分析大结局|学习规划

u-boot分析(十一) 通过前面十篇博文,我们已经完成了对BL1阶段的分析,通过这些分析相信我们对u-boot已经有了一个比较深入的认识,在BL2阶段大部分是对外设的初始化,并且有的我们已经分析过,在这篇博文我打算对BL1阶段没有分析到的重要外设进行简单分析,并结束对u-boot的分析,同时对后面自己的博文进行简单的规划,希望有兴趣的朋友跟我一块学习和研究嵌入式. 今天我们会分析到以下内容: 1.      MMU分析(内容出自我以前的博客) 2.      裸机开发总结 3.      后期

Collections中sort()方法源代码的简单分析

Collections的sort方法代码: public static <T> void sort(List<T> list, Comparator<? super T> c) { Object[] a = list.toArray(); Arrays.sort(a, (Comparator)c); ListIterator i = list.listIterator(); for (int j=0; j<a.length; j++) { i.next(); i.

netback的tasklet调度问题及网卡丢包的简单分析

最近在万兆网卡上测试,出现了之前千兆网卡没有出现的一个现象,tasklet版本的netback下,vm进行发包测试,发现vif的interrupt默认绑定在cpu0上,但是vm发包运行时发现host上面cpu1, cpu2的ksoftirqd很高. 从之前的理解上来说,包从netfront出来通过eventchannel notify触发vif的irq处理函数,然后tasklet_schedule调用tx_action,通过一系列处理流程把包发给网卡.所以vif的interrupt绑在哪个cpu

搜索引擎原理之链接原理的简单分析

在google诞生以前,传统搜索引擎主要依靠页面内容中的关键词匹配搜索词进行排名.这种排名方式的短处现在看来显而易见,那就是很容易被刻意操纵.黑帽SEO在页面上推挤关键词,或加入与主题无关的热门关键词,都能提高排名,使搜索引擎排名结果质量大为下降.现在的搜索引擎都使用链接分析技术减少垃圾,提高用户体验.下面泡馆史明星就来简单的介绍链接在搜索引擎排名中的应用原理. 在排名中计入链接因素,不仅有助于减少垃圾,提高结果相关性,也使传统关键词匹配无法排名的文件能够被处理.比如图片.视频无法进行关键词匹配