static dictionary methods of text compression

  Now I will introduce a way to compress a text. When we are confronted with numerous data, and the data has a similar structure, we can take advantage of the feature to improve the performance of compression. In most of times, we could take the method to compress a text as its feature of data structure.

  we classify the method named dictionary method into two categories. One is static dictionary method, and the other is auto or dynamic dictionary method.

Now I plan to describe the first shortly with a routine example.

  if we have much information about a structure of a text , it is available to take the static dictionary method. We could use many ways to implement the method varying with occasions, but a way named double letters code is popular with programmers.

  To make it clearer, I prefer to take a simple example to explain the method, as follows.

  Now there is a signal composed by five letters, that is ‘a‘, ‘b‘, ‘c‘, ‘d‘ and ‘r‘. Then we get a dictionary accroding to our signal knowledge. The dictionary is

code letter
000 a
001 b
010 c
011 d
100 r
101 ab
110 ac
111 ad

  Then I will code a sequence that is ‘abracadabra‘.

  At first, the coder will read the first of two letters, which are ‘ab‘. After that, the coder have to find if the pair of letters is in our dictionary. If it does,  the coder will return the letters‘s code and read the next letters. otherwise it will return the first letter‘s code and read the following letter. In this example, the coder will find the code in the dictionary, and return ‘101‘. Following the step, the coder reads ‘ra‘, but it cann‘t find the value of our dictionary by key ‘ra‘. So it have to return the code of ‘r‘ that is ‘100‘, and read the letter ‘c‘ following ‘a‘ to compose of a new pair of letters  that is ‘ac‘. The coder return ‘110‘. Then read ‘ad‘, return ‘110‘. ...

  The output is ‘101100110111101100000‘.

  The routine written by python is as follows.  

 1 def getCodeDict():
 2     codeDict = {}
 3     codeDict[‘a‘] = ‘000‘
 4     codeDict[‘b‘] = ‘001‘
 5     codeDict[‘c‘] = ‘010‘
 6     codeDict[‘d‘] = ‘011‘
 7     codeDict[‘r‘] = ‘100‘
 8     codeDict[‘ab‘] = ‘101‘
 9     codeDict[‘ac‘] = ‘110‘
10     codeDict[‘ad‘] = ‘111‘
11     return codeDict
12
13 def compress(code):
14     print(‘start to compress‘)
15     result = ‘‘
16     codeDict = getCodeDict()
17     offset = 2
18     unCodedCode = code
19     while unCodedCode != ‘‘:
20         targetCode = unCodedCode[0 : 2]
21         if targetCode in codeDict:
22             #find a pair of letters, and move two steps
23             result = result + codeDict[targetCode]
24             offset = 2
25         else :
26             #not find a pair of letters, and move only one step
27             result = result + codeDict[targetCode[0]]
28             offset = 1
29         unCodedCode = unCodedCode[offset : ]
30     print(‘complete to compress‘)
31     return result
32
33 if __name__==‘__main__‘:
34     signals = ‘abracadabra‘
35     result = compress(signals)
36     print(result)
时间: 2024-08-08 14:41:31

static dictionary methods of text compression的相关文章

Effective Java - Item 1: Consider static factory methods instead of constructors

考虑使用静态工厂方法来替代构造方法, 这样的做的好处有四点. 1. 更好的表意 有的构造方法实际上有特殊的含义, 使用静态工厂方法能更好的表达出他的意思. 例如 BigInteger(int, int, Random) , 它返回一个可能是素数的 BigInteger. 使用工厂方法 BigInteger.probablePrime 可以更好的表达出他的意思 2. 无需每次创建新对象 在某些场景下, 我们无需每次都创建新的对象, 这样就可以使用静态工厂方法替代构造方法, 例如 Boolean.v

public static void speckOnWin7(string text),在win7中读文字

public static void speckOnWin7(string text) {    //洪丰写的,转载请注明 try { string lsSource = ""; //if (File.Exists(Application.StartupPath + "\\Error.txt")) // lsSource = File.ReadAllText(Application.StartupPath + "\\Error.txt"); //

Read Notes:[Effective Java] Consider static factory methods instead of Constructors

Providing a static method instead of a public constructor has both advantages and disadvantages. One advantage of static factory methods is that, unlike constructors,they have names. A Second advantage of static factory methods is that, unlike constr

Effective Java P2 Item1 Consider static factory methods instead of constructors

获得一个类的实例的传统方法是公共的构造方法,还可以提供一个公共的静态工厂方法(一个返回值为该类实例的简单静态方法), 例如Boolean(boolean 的封装类) public static Boolean valueOf(boolean b) { return b ? Boolean.TRUE : Boolean.FALSE; } 此方法将boolean的原始值转变成Boolean对象的引用. 注意:这里的静态工厂方法与设计模式中的工厂方法不一样.静态工厂方法有优缺点. 优点:①与构造方法相

《深入理解C#》代码片段-用Dictionary<TKey,TValue>统计文本中的单词

1 public class Words 2 { 3 public static Dictionary<string, int> CountWords(string text) 4 { 5 Dictionary<string, int> frequencies;//创建从单词到频率的新映射 6 frequencies = new Dictionary<string, int>(); 7 string[] words = Regex.Split(text, @"

日志记录类(明确FileStream\Dictionary等用法)

一个好的程序对于日志的处理是必不可少的.好的日志记录可以帮助我们减少更好的查找错误和系统的维护.今天整理一下自己工作中平时用来记录日志的类,同时也补补基础知识. 功能: 根据程序App.config中配置的路径,创建日志文件并将程序的日志写到相应的文件中. 首先来看一下我之前自己写的一个用于写日志的类,源代码如下: using System; using System.Collections.Generic; using System.Linq; using System.Text; using

c# 扩展方法奇思妙用基础篇五:Dictionary&lt;TKey, TValue&gt; 扩展

Dictionary<TKey, TValue>类是常用的一个基础类,但用起来有时确不是很方便.本文逐一讨论,并使用扩展方法解决. 向字典中添加键和值 添加键和值使用 Add 方法,但很多时候,我们是不敢轻易添加的,因为 Dictionary<TKey, TValue>不允许重复,尝试添加重复的键时 Add 方法引发 ArgumentException. 大多时候,我们都会写成以下的样子: var dict = new Dictionary<int, string>()

Dictionary序列化和反序列化

public class SerializeHelper { public static string XmlSerialize(List<CustomSearchEntity> obj) { XmlSerializer serializer = new XmlSerializer(); return serializer.Serialization(obj, typeof(List<CustomSearchEntity>)); } public static List<Cu

DataReader转Dictionary数据类型之妙用

datareader转dictionary有很多用处,可以输出表中部分字段转实体字段,以前需要全部字段输出或者再建一个实体模型才行,这样就可以减少数据库的输出量了,特别是某些接口的格式化输出很方便. 先看底层代码部分 /// <summary> /// DataReader转Dictionary<string, object>数据类型/// </summary> /// <param name="dataReader"></para