Hadoop学习笔记(2)

Hadoop序列化:Long 和Int---变长编码的方法:

如果整数在[ -112, 127] ,所需字节数为1,即第一个字节数就表示该值。

如果大于127,则第一个字节数在[-120,-113]之内,正数字节数为(-112-第一个字节)---最多八个字节。

如果小于-112,则第一个字节数在[-128,-121]之内,负数字节数为(-120-第一个字节)---最多八个字节。

原码-----反码(符号位不变,个位取反)-----补码(符号位不变,在反码基础上加1)

例如-5:

原码(1000 0101)----反码(1111 1010)----补码(1111 1011)

 

学习下one‘s complement :

https://ccrma.stanford.edu/~jos/mdft/One_s_Complement_Fixed_Point_Format.html

One‘s Complement is a particular assignment of bit patterns to numbers. For example, in the case of 3-bit binary numbers, we have the assignments shown in Table G.2.

Table G.2: Three-bit one‘s-complement binaryfixed-pointnumbers.

In general, -bit numbers are assigned to binary counter values in the ``obvious way‘‘ as integers from 0 to , and then the negative numbers are assigned in reverse order, as shown in the example.

The term ``one‘s complement‘‘ refers to the fact that negating a number in this format is accomplished by simply complementing the bit pattern (inverting each bit).

Note that there are two representations for zero (all 0s and all 1s). This is inconvenient when testing if a number is equal to zero. For this reason, one‘s complement is generally not used。

 

two’s complement

https://ccrma.stanford.edu/~jos/mdft/Two_s_Complement_Fixed_Point_Format.html

In two‘s complement, numbers are negated by complementing the bit pattern and adding 1, with overflow ignored. From 0 to , positive numbers are assigned to binary values exactly as in one‘s complement. The remaining assignments (for the negative numbers) can be carried out using the two‘s complement negation rule. Regenerating the example in this way gives Table G.3.

Table G.3: Three-bit two‘s-complement binaryfixed-pointnumbers.

 

Note that according to our negation rule, . Logically, what has happened is that the result has ``overflowed‘‘ and ``wrapped around‘‘ back to itself. Note that also. In other words, if you compute 4 somehow, since there is no bit-pattern assigned to 4, you get -4, because -4 is assigned the bit pattern that would be assigned to 4 if were larger. Note that numerical overflows naturally result in ``wrap around‘‘ from positive to negative numbers (or from negative numbers to positive numbers). Computers normally ``trap‘‘ overflows as an ``exception.‘‘ The exceptions are usually handled by a software ``interrupt handler,‘‘ and this can greatly slow down the processing by the computer (one numerical calculation is being replaced by a rather sizable program).

Note that temporary overflows are ok in two‘s complement; that is, if you add to to get , adding to will give again. This is why two‘s complement is a nice choice: it can be thought of as placing all the numbers on a ``ring,‘‘ allowing temporary overflows of intermediate results in a long string of additions and/or subtractions. All that matters is that the final sum lie within the supported dynamic range.

Computers designed with signal processing in mind (such as so-called ``Digital Signal Processing (DSP) chips‘‘) generally just do the best they can without generating exceptions. For example, overflows quietly ``saturate‘‘ instead of ``wrapping around‘‘ (the hardware simply replaces the overflow result with the maximum positive or negative number, as appropriate, and goes on). Since the programmer may wish to know that an overflow has occurred, the first occurrence may set an ``overflow indication‘‘ bit which can be manually cleared. The overflow bit in this case just says an overflow happened sometime since it was last checked.

public static void writeVLong(DataOutput stream, long i) throws IOException {
    if (i >= -112 && i <= 127) {
      stream.writeByte((byte)i);
      return;
    }

    int len = -112;
    if (i < 0) {
      i ^= -1L; // take one‘s complement‘
      len = -120;
    }

    long tmp = i;
//多少个字节,就循环多少遍----求字节数
    while (tmp != 0) {
      tmp = tmp >> 8;
      len--;
    }

    stream.writeByte((byte)len);

    len = (len < -120) ? -(len + 120) : -(len + 112);

    for (int idx = len; idx != 0; idx--) {
      int shiftbits = (idx - 1) * 8;
      long mask = 0xFFL << shiftbits;//高位在前输出.
      stream.writeByte((byte)((i & mask) >> shiftbits));
    }
  }
时间: 2024-10-12 07:18:27

Hadoop学习笔记(2)的相关文章

Hadoop学习笔记(6) ——重新认识Hadoop

Hadoop学习笔记(6) ——重新认识Hadoop 之前,我们把hadoop从下载包部署到编写了helloworld,看到了结果.现是得开始稍微更深入地了解hadoop了. Hadoop包含了两大功能DFS和MapReduce, DFS可以理解为一个分布式文件系统,存储而已,所以这里暂时就不深入研究了,等后面读了其源码后,再来深入分析. 所以这里主要来研究一下MapReduce. 这样,我们先来看一下MapReduce的思想来源: alert("I'd like some Spaghetti!

Hadoop学习笔记(7) ——高级编程

Hadoop学习笔记(7) ——高级编程 从前面的学习中,我们了解到了MapReduce整个过程需要经过以下几个步骤: 1.输入(input):将输入数据分成一个个split,并将split进一步拆成<key, value>. 2.映射(map):根据输入的<key, value>进生处理, 3.合并(combiner):合并中间相两同的key值. 4.分区(Partition):将<key, value>分成N分,分别送到下一环节. 5.化简(Reduce):将中间结

Hadoop学习笔记(8) ——实战 做个倒排索引

Hadoop学习笔记(8) ——实战 做个倒排索引 倒排索引是文档检索系统中最常用数据结构.根据单词反过来查在文档中出现的频率,而不是根据文档来,所以称倒排索引(Inverted Index).结构如下: 这张索引表中, 每个单词都对应着一系列的出现该单词的文档,权表示该单词在该文档中出现的次数.现在我们假定输入的是以下的文件清单: T1 : hello world hello china T2 : hello hadoop T3 : bye world bye hadoop bye bye 输

Hadoop学习笔记_2_Hadoop源起与体系概述[续]

Hadoop源起与体系概述 Hadoop的源起--Lucene Lucene是Doug Cutting开创的开源软件,用java书写代码,实现与Google类似的全文搜索功能,它提供了全文检索引擎的架构,包括完整的查询引擎和索引引擎 早期发布在个人网站和SourceForge,2001年年底成为apache软件基金会jakarta的一个子项目 Lucene的目的是为软件开发人员提供一个简单易用的工具包,以方便的在目标系统中实现全文检索的功能,或者是以此为基础建立起完整的全文检索引擎 对于大数据的

Hadoop学习笔记(4) ——搭建开发环境及编写Hello World

Hadoop学习笔记(4) ——搭建开发环境及编写Hello World 整个Hadoop是基于Java开发的,所以要开发Hadoop相应的程序就得用JAVA.在linux下开发JAVA还数eclipse方便. 下载 进入官网:http://eclipse.org/downloads/. 找到相应的版本进行下载,我这里用的是eclipse-SDK-3.7.1-linux-gtk版本. 解压 下载下来一般是tar.gz文件,运行: $tar -zxvf eclipse-SDK-3.7.1-linu

Hadoop学习笔记(5) ——编写HelloWorld(2)

Hadoop学习笔记(5) ——编写HelloWorld(2) 前面我们写了一个Hadoop程序,并让它跑起来了.但想想不对啊,Hadoop不是有两块功能么,DFS和MapReduce.没错,上一节我们写了一个MapReduce的HelloWorld程序,那这一节,我们就也学一学DFS程序的编写. DFS是什么,之前已经了解过,它是一个分布式文件存储系统.不管是远程或本地的文件系统,其实从接口上讲,应该是一至的,不然很难处理.同时在第2节的最后,我们列出了很多一些DFS的操作命令,仔细看一下,这

hadoop 学习笔记:mapreduce框架详解

hadoop 学习笔记:mapreduce框架详解 开始聊mapreduce,mapreduce是hadoop的计算框架,我 学hadoop是从hive开始入手,再到hdfs,当我学习hdfs时候,就感觉到hdfs和mapreduce关系的紧密.这个可能是我做技术研究的 思路有关,我开始学习某一套技术总是想着这套技术到底能干什么,只有当我真正理解了这套技术解决了什么问题时候,我后续的学习就能逐步的加快,而学习 hdfs时候我就发现,要理解hadoop框架的意义,hdfs和mapreduce是密不

Hadoop学习笔记_7_分布式文件系统HDFS --DataNode体系结构

分布式文件系统HDFS --DataNode体系结构 1.概述 DataNode作用:提供真实文件数据的存储服务. 文件块(block):最基本的存储单位[沿用的Linux操作系统地概念].对于文件内容而言,一个文件的长度大小是size,那么从文件的0偏移开始,按照固定的大小,顺序对文件进行划分并编号,划分好的每一个块称一个Block. 与Linux操作系统不同的是,一旦上传了一个小于Block大小的文件,则该文件会占用实际文件大小的空间. 2.进入hdfs-default.xml <prope

Hadoop学习笔记(2) ——解读Hello World

Hadoop学习笔记(2) ——解读Hello World 上一章中,我们把hadoop下载.安装.运行起来,最后还执行了一个Hello world程序,看到了结果.现在我们就来解读一下这个Hello Word. OK,我们先来看一下当时在命令行里输入的内容: $mkdir input $cd input $echo "hello world">test1.txt $echo "hello hadoop">test2.txt $cd .. $bin/ha

hadoop学习笔记——基础知识及安装

1.核心 HDFS  分布式文件系统    主从结构,一个namenoe和多个datanode, 分别对应独立的物理机器 1) NameNode是主服务器,管理文件系统的命名空间和客户端对文件的访问操作.NameNode执行文件系统的命名空间操作,比如打开关闭重命名文件或者目录等,它也负责数据块到具体DataNode的映射 2)集群中的DataNode管理存储的数据.负责处理文件系统客户端的文件读写请求,并在NameNode的统一调度下进行数据块的创建删除和复制工作. 3)NameNode是所有