Hadoop辅助排序样例二

1. 需求

求每年的最高温度

2. 样例数据

3. 思路、代码

将记录按年份分组并按温度降序排序，然后才将同一年份的所有记录送到一个 reducer 组，则各组的首条记录就是这一年的最高温度。实现此方案的要点是：

a. 定义包括自然键(年份)和自然值(温度)的组合键。

b. 根据组合键对记录进行排序。

c. 针对组合键进行分区和分组时均只考虑自然键。

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

/**
 * 组合键，此例中用于辅助排序，包括年份和温度。
 */
public class IntPair implements WritableComparable<IntPair> {

    private IntWritable first;
    private IntWritable second;

    public IntPair() {
        this.first = new IntWritable();
        this.second = new IntWritable();
        //若注释掉上面两行，使用时会发生异常 java.lang.NullPointerException at IntPair.readFields
    }

    public IntPair(int first, int second) {
        set(new IntWritable(first), new IntWritable(second));
    }

    public IntPair(IntWritable first, IntWritable second) {
        set(first, second);
    }

    public void set(IntWritable first, IntWritable second) {
        this.first = first;
        this.second = second;
    }

    public IntWritable getFirst() {
        return first;
    }

    public IntWritable getSecond() {
        return second;
    }

    public void write(DataOutput out) throws IOException {
        first.write(out);
        second.write(out);
    }

    public void readFields(DataInput in) throws IOException {
        first.readFields(in);
        second.readFields(in);
    }

    @Override
    public int hashCode() {
        return first.hashCode() * 163 + second.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (obj instanceof IntPair) {
            IntPair ip = (IntPair) obj;
            return first.get() == ip.first.get() && second.get() == ip.second.get();
        }
        return false;
    }

    @Override
    public String toString() {
        return first + "\t" + second;
    }

    public int compareTo(IntPair o) {
        int cmp = first.compareTo(o.first);
        if (cmp == 0) {
            cmp = second.compareTo(o.second);
        }
        return cmp;
    }
}

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
import org.apache.hadoop.io.WritableUtils;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

import java.io.IOException;

public class MaxTemperatureUsingSecondarySort extends Configured implements Tool {

    static class MaxTemperatureMapper extends Mapper<LongWritable, Text, IntPair, NullWritable> {
        @Override
        protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
            String[] val = value.toString().split("\\t");
            if (val.length == 2) {
                context.write(new IntPair(Integer.parseInt(val[0]), Integer.parseInt(val[1])), NullWritable.get());
            }
        }
    }

    static class MaxTemperatureReducer extends Reducer<IntPair, NullWritable, IntPair, NullWritable> {
        @Override
        protected void reduce(IntPair key, Iterable<NullWritable> values, Context context) throws IOException, InterruptedException {
            context.write(key, NullWritable.get()); //仅输出第一行
        }
    }

    //仅根据 first 分区
    public static class FirstPartitioner extends Partitioner<IntPair, NullWritable> {
        @Override
        public int getPartition(IntPair key, NullWritable value, int numPartitions) {
            return (key.getFirst().hashCode() & Integer.MAX_VALUE) % numPartitions;
        }
    }

    //仅根据 first 分组
    public static class GroupComparator extends WritableComparator {
        private static final IntWritable.Comparator INT_COMPARATOR = new IntWritable.Comparator();

        protected GroupComparator() {
            super(IntPair.class, true);
        }

        @Override
        public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
            try {
                int firstL1 = WritableUtils.decodeVIntSize(b1[s1]) + readVInt(b1, s1);
                int firstL2 = WritableUtils.decodeVIntSize(b2[s2]) + readVInt(b2, s2);
                return INT_COMPARATOR.compare(b1, s1, firstL1, b2, s2, firstL2);
            } catch (IOException e) {
                throw new IllegalArgumentException(e);
            }
        }

        @Override
        public int compare(WritableComparable a, WritableComparable b) {
            if (a instanceof IntPair && b instanceof IntPair) {
                return ((IntPair) a).getFirst().compareTo(((IntPair) b).getFirst());
            }
            return super.compare(a, b);
        }
    }

    //根据组合键排序
    public static class KeyComparator extends WritableComparator {
        protected KeyComparator() {
            super(IntPair.class, true);
        }

        @Override
        public int compare(WritableComparable a, WritableComparable b) {
            if (a instanceof IntPair && b instanceof IntPair) {
                IntPair ip1 = (IntPair) a;
                IntPair ip2 = (IntPair) b;
                int cmp = ip1.getFirst().compareTo(ip2.getFirst()); //升序（年份）
                if (cmp != 0) {
                    return cmp;
                }
                return -ip1.getSecond().compareTo(ip2.getSecond()); //降序（温度）
            }
            return super.compare(a, b);
        }
    }

    public int run(String[] args) throws Exception {
        Configuration conf = new Configuration();
        String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
        if (otherArgs.length != 2) {
            System.err.println("Parameter number is wrong, please enter two parameters：<input> <output>");
            System.exit(-1);
        }

        Path inputPath = new Path(otherArgs[0]);
        Path outputPath = new Path(otherArgs[1]);

        //conf.set("fs.defaultFS", "hdfs://vmnode.zhch:9000");
        Job job = Job.getInstance(conf, "MaxTemperatureUsingSecondarySort");
        //job.setJar("F:/workspace/AssistRanking2/target/AssistRanking2-1.0-SNAPSHOT.jar");

        job.setMapperClass(MaxTemperatureMapper.class);
        job.setPartitionerClass(FirstPartitioner.class);
        job.setSortComparatorClass(KeyComparator.class); //默认根据 Key 的 compareTo 函数排序
        job.setGroupingComparatorClass(GroupComparator.class);
        job.setReducerClass(MaxTemperatureReducer.class);
        job.setMapOutputKeyClass(IntPair.class);
        job.setOutputKeyClass(IntPair.class);
        job.setOutputValueClass(NullWritable.class);

        FileInputFormat.addInputPath(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);

        return job.waitForCompletion(true) ? 0 : 1;
    }

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new MaxTemperatureUsingSecondarySort(), args);
        System.exit(exitCode);
    }
}

4. 运行截图

注：本例源自《Hadoop权威指南》第三版 8.2.4

时间： 2024-10-28 15:31:37

Hadoop辅助排序样例二的相关文章

Hadoop辅助排序样例一

1. 样例数据 011990-99999 SIHCCAJAVRI 012650-99999 TYNSET-HANSMOEN 012650-99999 194903241200 111 012650-99999 194903241800 78 011990-99999 195005150700 0 011990-99999 195005151200 22 011990-99999 195005151800 -11 2. 需求 3. 思路.代码将气象站ID相同的气象站信息和天气信息交由同一个 Re

hadoop的WordCount样例

package cn.lmj.mapreduce; import java.io.IOException; import java.util.Iterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org

solr特点三: 排序样例汇总

目的是提供solrj 实现查询的样例参考单维度排序 //查询条件 query.setQuery(queryString); // add 是添加 query.addSortField(field_price, ORDER.asc); //set是覆盖,也就是后面的覆盖前面的. query.setSortField(field_price,ORDER.desc); //如果需要第一维度值相等,按第二维度继续排序的话,继续add query.addSortField(field_fans_cou

[0010] windows 下 eclipse 开发 hdfs程序样例 (二)

目的: 学习windows 开发hadoop程序的配置相关: [0007] windows 下 eclipse 开发 hdfs程序样例环境: 基于以下环境配置好后. [0008] Windows 7 下 hadoop 2.6.4 eclipse 本地开发调试配置 1. 新建HDFS下载文件类在已有mapreduce项目中新建类添加如下代码,代码从[0007]中取出小修改功能:从hdfs下载文件到windows本地 package hadoop.hdfs; import java.io.F

Android利用Volley异步载入数据完整具体演示样例(二)

MainActivity例如以下: package cc.y; import android.app.Activity; import android.content.Context; import android.graphics.Bitmap; import android.graphics.Bitmap.Config; import android.os.Bundle; import android.util.LruCache; import android.widget.ImageVie

linux中生成考核用的FAT32文件系统结构样例(二)

实验FAT32-2说明:FAT32-2\目录下的xxx.tar.gz解压后是一个FAT32文件系统的分区镜像,其DBR及备份的一些参数错误,请使用winhex手工方式修复DBR,并回答修改后的DBR的md5 HASH值. 要求: 1.利用WINHEX手工方式读取. 2.不得使用WINHEX模板功能. 3.不得使用WINHEX文件系统解析功能. 4.出错部分仅为DBR保留扇区.FAT表份数.FAT表大小.文件系统扇区总数.每簇扇区数.有效结束标志,其余部分不得修改. 5.文件系统扇区总数为可利用的

linux中生成考核用的NTFS文件系统结构样例(二)

实验NTFS-2说明:NTFS-2.img是一个包含NTFS文件系统的磁盘镜像,请使用winhex手工方式读出这个文件系统内的指定文件,并回答其md5 HASH值. 要求: 1.利用WINHEX手工方式读取. 2.不得使用WINHEX模板功能. 3.不得使用WINHEX文件系统解析功能. 4.填写的MD5 HASH值全部为大写,不包括0x头标或H尾标,中间不得有任何间隔符号(包括空格.制表符.'-'等符号),以WINHEX软件运算出的HASH值为准. 实验目的: 1.实现手工方式跟踪一个NTFS

drools6入门样例(二)

产品规则例如以下: 1:单个产品数量超过2个,该产品打9折 2:总价格超过1000,立减50 新建maven项目.增加drools的依赖 <dependency> <groupId>org.drools</groupId> <artifactId>drools-core</artifactId> <version>6.2.0.Final</version> </dependency> <dependenc

样例二：计算圆周率

1 #include<iostream> 2 #include<cmath> 3 using namespace std; 4 5 #define r 1 6 7 /* 8 计算2^n边型的单边边长 9 */ 10 double singleLength(int n) { 11 if (n == 2) 12 return sqrt(2)*r; 13 double sinalphad2 = singleLength(n - 1) / (2 * r); 14 double cosalp