Writable Interface

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * Created by user on 16/3/17.
 */
public interface Writable {
    void write(DataOutput out) throws IOException;
    void readFields(DataInput in) throws  IOException;
}
  • Writable主要定义两个方法,一个writing its state to a DataOutput binary stream(我就不理解,为什么这里要用这个state),另一个就是从一个输入二进制流中读取它的状态。
IntWritable writable = new IntWritable(163);public static byte[] serialize(Writable writable) throws IOException {
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DataOutputStream dataOut = new DataOutputStream(out);
        writable.write(dataOut);
        dataOut.close();
        return out.toByteArray();
    }
  • 这里主要是测试IntWritable的序列化方式。
public static byte[] deserialize(Writable writable,byte[] bytes) throws  IOException{
    ByteArrayInputStream in = new ByteArrayInputStream(bytes);
    DataInputStream dataIn = new DataInputStream(in);
    writable.readFields(dataIn);
    dataIn.close();
    return bytes;
}
  • 这个主要是反序列化IntWritable
public interface  WritableComparable<T> extends  Writable,Comparable<T> {}

public interface  RawComparator<T> extends  Comparable<T> {
    public int compare(byte[] b1,int s1, int l1, byte[] b2, int s2, int l2);
}
  • IntWritable实现了WritableComparable接口,它是Writable和Comparable的子接口。比较类型对于MapReduce是比较关键的,在排序阶段需要key和其他key进行比较。hadoop提供RawComparator(继承Comparator)
  • 这个接口的实现允许比较从流中读取的记录,而不需要反序列化他们变成对象。避免了生成对象的开销,
时间: 2024-10-07 09:11:30

Writable Interface的相关文章

Spark学习笔记

Spark 阅读官方文档 Spark Quick Start Spark Programming Guide Spark SQL, DataFrames and Datasets Guide Cluster Mode Overview Spark Standalone Mode 重要的概念:resilient distributed dataset (RDD), a collection of elements partitioned across the nodes of the cluste

Spark编程模型及RDD操作

转载自:http://blog.csdn.net/liuwenbo0920/article/details/45243775 1. Spark中的基本概念 在Spark中,有下面的基本概念.Application:基于Spark的用户程序,包含了一个driver program和集群中多个executorDriver Program:运行Application的main()函数并创建SparkContext.通常SparkContext代表driver programExecutor:为某App

Flink - Juggling with Bits and Bytes

http://www.36dsj.com/archives/33650 http://flink.apache.org/news/2015/05/11/Juggling-with-Bits-and-Bytes.html http://www.bigsynapse.com/addressing-big-data-performance ,addressing-big-data-performance   第一篇描述,当前JVM存在的问题, 1. Java对象开销 Java对象的存储密度相对偏低,对

Spark1.1.0 Actions

Actions The following table lists some of the common actions supported by Spark. Refer to the RDD API doc (Scala, Java, Python) and pair RDD functions doc (Scala, Java) for details. Action Meaning reduce(func) Aggregate the elements of the dataset us

Spark问题笔记3

1.RDD的缓存策略是什么? 缓存策略对应类StorageLevel,包括多种存储级别: objectStorageLevel{ val NONE=newStorageLevel(false,false,false,false) val DISK_ONLY=newStorageLevel(true,false,false,false) val DISK_ONLY_2=newStorageLevel(true,false,false,false,2) val MEMORY_ONLY=newStor

HBase的WAL机制

WAL(Write-Ahead-Log)预写日志是HBase的RegionServer在处理数据插入和删除的过程中用来记录操作内容的一种日志.在每次Put.Delete等一条记录时,首先将其数据写入到RegionServer对应的HLog文件的过程. 客户端往RegionServer端提交数据的时候,会先写WAL日志,只有当WAL日志写成功以后,客户端才会被告诉提交数据成功,如果写WAL失败会告知客户端提交失败,换句话说这其实是一个数据落地的过程. 在一个RegionServer上的所有的Reg

Spark1.1.0 Spark Programming Guide

Spark Programming Guide Overview Linking with Spark Initializing Spark Using the Shell Resilient Distributed Datasets (RDDs) Parallelized Collections External Datasets RDD Operations Basics Passing Functions to Spark Working with Key-Value Pairs Tran

netty io.netty.channel介绍2

Interface ChannelHandlerContext 上下文对象使得当前channelhandler可以与其所属的channelpipeline以及其他handler进行交互,可以通知所属channelpipeline中的下一个handler,也可动态修改其所属的channelpipeline,具体功能如下: 通知.通过调用channelhandlercontext提供的方法可以调用同一个channelpipeline中的相邻的下一个channelhandler,详情可以参照chann

玩转Bits和Bytes(一)

How Apache Flink operates on binary data Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks su