HBase - MapReduce - HBase 作为输入源的示例 | 那伊抹微笑

博文作者：那伊抹微笑

csdn 博客地址：http://blog.csdn.net/u012185296

itdog8 地址链接 : http://www.itdog8.com/thread-203-1-1.html

博文标题：HBase - MapReduce - HBase 作为输入源的示例 | 那伊抹微笑

个性签名：世界上最遥远的距离不是天涯，也不是海角，而是我站在妳的面前，妳却感觉不到我的存在

技术方向：Flume+Kafka+Storm+Redis/Hbase+Hadoop+Hive+Mahout+Spark ... 云计算技术

qq交流群：214293307 （期待与你一起学习，共同进步）

参考 : http://abloz.com/hbase/book.html#mapreduce.example

1 官网代码

下面是使用HBase 作为源的MapReduce读取示例。特别是仅有Mapper实例，没有Reducer。Mapper什么也不产生。

如下所示...

Configuration config = HBaseConfiguration.create();Job job = new Job(config, "ExampleRead");job.setJarByClass(MyReadJob.class);     // class that contains mapper

Scan scan = new Scan();scan.setCaching(500);        // 1 is the default in Scan, which will be bad for MapReduce jobsscan.setCacheBlocks(false);  // don‘t set to true for MR jobs// set other scan attrs...

TableMapReduceUtil.initTableMapperJob(  tableName,        // input HBase table name  scan,             // Scan instance to control CF and attribute selection  MyMapper.class,   // mapper  null,             // mapper output key   null,             // mapper output value  job);job.setOutputFormatClass(NullOutputFormat.class);   // because we aren‘t emitting anything from mapper

boolean b = job.waitForCompletion(true);if (!b) {  throw new IOException("error with job!");}

...mapper需要继承于TableMapper...

public class MyMapper extends TableMapper<Text, LongWritable> {
public void map(ImmutableBytesWritable row, Result value, Context context)
throws InterruptedException, IOException {
// process data for the row from the Result instance.

2 我的参考代码

package com.itdog8.cloud.hbase.mr.test;

import java.io.IOException;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Scan;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableMapper;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

/**
 * TestHBaseAsSourceMapReduceMainClass
 *
 * @author 那伊抹微笑
 * @date 2015-07-30 18:00:21
 *
 */
public class TestHBaseAsSourceMapReduceMainClass {
 private static final Log _log = LogFactory.getLog(TestHBaseAsSourceMapReduceMainClass.class);

 private static final String JOB_NAME = "TestHBaseAsSourceMapReduce";
 private static String tmpPath = "/tmp/com/itdog8/yting/TestHBaseAsSourceMapReduce";
 private static String hbaseInputTble = "itdog8:test_1";

 public static class ExampleSourceMapper extends TableMapper<Text, Text> {
  private Text k = new Text();
  private Text v = new Text();

  @Override
  protected void setup(Context context) throws IOException, InterruptedException {
   super.setup(context);
  }

  @Override
  protected void map(ImmutableBytesWritable key, Result value, Context context) throws IOException, InterruptedException {
   String rowkey = Bytes.toString(key.get());

   // 这里的操作需要熟悉下 Result 的操作就行了，接下来就是业务逻辑了
   try {

    // set value
    k.set("望咩望");
    v.set("食屎啦你");

    // context write to reducer
    context.write(k, v);
   } catch (Exception e) {
    e.printStackTrace();
   }
  }

  @Override
  protected void cleanup(Context context) throws IOException, InterruptedException {
   super.cleanup(context);
  }

 }

 public static void main(String[] args) throws Exception {
  // hbase configuration
  Configuration conf = HBaseConfiguration.create();
  conf.set("hbase.zookeeper.quorum", "a234-198.hadoop.com,a234-197.hadoop.com,a234-196.hadoop.com");
  conf.set("hbase.zookeeper.property.clientPort", "2181");

  // batch and caching
  Scan scan = new Scan();
  scan.setCaching(10000);
  scan.setCacheBlocks(false);
  scan.setMaxVersions(1);

  // set hadoop speculative execution to false
  conf.setBoolean("mapred.map.tasks.speculative.execution", false);
  conf.setBoolean("mapred.reduce.tasks.speculative.execution", false);

  // tmp index path
  tmpPath = args[0];
  Path tmpIndexPath = new Path(tmpPath);
  FileSystem fs = FileSystem.get(conf);
  if(fs.exists(tmpIndexPath)) {
//	 fs.delete(tmpIndexPath, true); // dangerous
//	 _log.info("delete tmp index path : " + tmpIndexPath.getName());
   _log.warn("The hdfs path ["+tmpPath+"] existed, please change a path.");
   return ;
  }

  // Job && conf
  Job job = new Job(conf, JOB_NAME);
  job.setJarByClass(TestHBaseAsSourceMapReduceMainClass.class);

  TableMapReduceUtil.initTableMapperJob(hbaseInputTble, scan, ExampleSourceMapper.class, Text.class, Text.class, job);
//	 job.setReducerClass(MyReducer.class); // 自己的处理逻辑
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(Text.class);
  job.setOutputFormatClass(TextOutputFormat.class);
  FileOutputFormat.setOutputPath(job, tmpIndexPath);

  int success = job.waitForCompletion(true) ? 0 : 1;

  System.exit(success);
 }
}

时间： 2024-08-05 11:14:00

HBase - MapReduce - HBase 作为输入源的示例 | 那伊抹微笑的相关文章

HBase - MapReduce - HBase 作为输出源的示例 | 那伊抹微笑

博文作者:那伊抹微笑 csdn 博客地址:http://blog.csdn.net/u012185296 itdog8 地址链接 : http://www.itdog8.com/thread-204-1-1.html 博文标题:HBase - MapReduce - HBase 作为输出源的示例 | 那伊抹微笑个性签名:世界上最遥远的距离不是天涯,也不是海角,而是我站在妳的面前,妳却感觉不到我的存在技术方向:Flume+Kafka+Storm+Redis/Hbase+Hadoop+Hive+

HBase - MapReduce - 使用 MapReduce 批量操作 HBase 介绍 | 那伊抹微笑

博文作者:那伊抹微笑 csdn 博客地址:http://blog.csdn.net/u012185296 itdog8 地址链接 : http://www.itdog8.com/thread-202-1-1.html 博文标题:HBase - MapReduce - 使用 MapReduce 批量操作 HBase 介绍 | 那伊抹微笑个性签名:世界上最遥远的距离不是天涯,也不是海角,而是我站在妳的面前,妳却感觉不到我的存在技术方向:Flume+Kafka+Storm+Redis/Hbase+

HBase - 计数器 - 计数器的介绍以及使用 | 那伊抹微笑

博文作者:那伊抹微笑 csdn 博客地址:http://blog.csdn.net/u012185296 itdog8 地址链接 : http://www.itdog8.com/thread-215-1-1.html 博文标题:HBase - 计数器 - 计数器的介绍以及使用 | 那伊抹微笑个性签名:世界上最遥远的距离不是天涯.也不是海角,而是我站在妳的面前,妳却感觉不到我的存在技术方向:Flume+Kafka+Storm+Redis/Hbase+Hadoop+Hive+Mahout+Spa

HBase - Filter - 过滤器的介绍以及使用 | 那伊抹微笑

博文作者:那伊抹微笑 csdn 博客地址:http://blog.csdn.net/u012185296 itdog8 地址链接 : http://www.itdog8.com/thread-214-1-1.html 博文标题:HBase - Filter - 过滤器的介绍以及使用 | 那伊抹微笑个性签名:世界上最遥远的距离不是天涯.也不是海角.而是我站在妳的面前.妳却感觉不到我的存在技术方向:Flume+Kafka+Storm+Redis/Hbase+Hadoop+Hive+Mahout+

HBase - Coprocessor - 协处理器之观察者Observer（触发器）的介绍以及使用 | 那伊抹微笑

博文作者:那伊抹微笑 csdn 博客地址:http://blog.csdn.net/u012185296 itdog8 地址链接 : http://www.itdog8.com/thread-216-1-1.html 博文标题:HBase - Coprocessor - 协处理器之观察者Observer(触发器)的介绍以及使用 | 那伊抹微笑个性签名:世界上最遥远的距离不是天涯,也不是海角,而是我站在妳的面前,妳却感觉不到我的存在技术方向:Flume+Kafka+Storm+Redis/Hb

run loop 输入源

做了一年多的IOS开发,对IOS和Objective-C深层次的了解还十分有限,大多还停留在会用API的级别,这是件挺可悲的事情.想学好一门语言还是需要深层次的了解它,这样才能在使用的时候得心应手,出现各种怪异的问题时不至于不知所措.废话少说,进入今天的正题. 不知道大家有没有想过这个问题,一个应用开始运行以后放在那里,如果不对它进行任何操作,这个应用就像静止了一样,不会自发的有任何动作发生,但是如果我们点击界面上的一个按钮,这个时候就会有对应的按钮响应事件发生.给我们的感觉就像应用一直处于随时

王家林的云计算分布式大数据Hadoop征服之旅：HDFS&MapReduce&HBase&Hive&集群管理

一:课程简介: 作为云计算实现规范和实施标准的Hadoop恰逢其时的应运而生,使用Hadoop用户可以在不了解分布式底层细节的情况下开发出分布式程序,从而可以使用众多廉价的计算设备的集群的威力来高速的运算和存储,而且Hadoop的运算和存储是可靠的.高效,的.可伸缩的,能够使用普通的社区服务器出来PB级别的数据,是分布式大数据处理的存储的理想选择. 本课程会助你深入浅出的掌握Hadoop开发(包括HDFS.MapReduce.HBase.Hive等),并且在此基础上掌握Hadoop集群的配置.维

Hbase + Mapreduce + eclipse实例

前面blog中提到了 eclipse操作单机版的Hbase列子不熟悉的朋友可以去看看 eclipse 连接并操作单机版Hbase 本篇文章介绍一个 Mapreduce 读取 Hbase 中数据并进行计算列子类似与 wordcount 不过此时的输入是从 Hbase中读取首先需要创建输入源启动hbase,打开Hbase shell 这里我的配置文件中不再是单机了而是采用了hdfs作为文件系统 <span style="

MapReduce/Hbase进阶提升(原理剖析、实战演练)

什么是MapReduce? MapReduce是一种编程模型,用于大规模数据集(大于1TB)的并行运算.概念"Map(映射)"和"Reduce(归约)",和他们的主要思想,都是从函数式编程语言里借来的,还有从矢量编程语言里借来的特性.他极大地方便了编程人员在不会分布式并行编程的情况下,将自己的程序运行在分布式系统上. 当前的软件实现是指定一个Map(映射)函数,用来把一组键值对映射成一组新的键值对,指定并发的Reduce(归约)函数,用来保证所有映射的键值对中的每一