MapReduce Design Patterns

1. Design Patterns and MapReduce. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Design Patterns
MapReduce History
MapReduce and Hadoop Refresher
Hadoop Example: Word Count
Pig and Hive
2. Summarization Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Numerical Summarizations
Pattern Description
Numerical Summarization Examples
Inverted Index Summarizations
Pattern Description
Inverted Index Example
Counting with Counters
Pattern Description
Counting with Counters Example
3. Filtering Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Filtering
Pattern Description
Filtering Examples
Bloom Filtering
Pattern Description
Bloom Filtering Examples
Top Ten
Pattern Description
Top Ten Examples

Distinct
Pattern Description
Distinct Examples
4. Data Organization Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structured to Hierarchical
Pattern Description
Structured to Hierarchical Examples
Partitioning
Pattern Description
Partitioning Examples
Binning
Pattern Description
Binning Examples
Total Order Sorting
Pattern Description
Total Order Sorting Examples
Shuffling
Pattern Description
Shuffle Examples
5. Join Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A Refresher on Joins
Reduce Side Join
Pattern Description
Reduce Side Join Example
Reduce Side Join with Bloom Filter
Replicated Join
Pattern Description
Replicated Join Examples
Composite Join
Pattern Description
Composite Join Examples
Cartesian Product
Pattern Description
Cartesian Product Examples
6. Metapatterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Job Chaining
With the Driver
Job Chaining Examples
With Shell Scripting

With JobControl
Chain Folding
The ChainMapper and ChainReducer Approach
Chain Folding Example
Job Merging
Job Merging Examples
7. Input and Output Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Customizing Input and Output in Hadoop
InputFormat
RecordReader
OutputFormat
RecordWriter
Generating Data
Pattern Description
Generating Data Examples
External Source Output
Pattern Description
External Source Output Example
External Source Input
Pattern Description
External Source Input Example
Partition Pruning
Pattern Description
Partition Pruning Examples
8. Final Thoughts and the Future of Design Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Trends in the Nature of Data
Images, Audio, and Video
Streaming Data
The Effects of YARN
Patterns as a Library or Component
How You Can Help
A. Bloom Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

时间: 2024-10-29 10:46:40

MapReduce Design Patterns的相关文章

MapReduce Design Patterns(chapter 2 (part 2))(三)

Median and standard deviation 中值和标准差的计算比前面的例子复杂一点.因为这种运算是非关联的,它们不是那么容易的能从combiner中获益.中值是将数据集一分为两等份的数值类型,一份比中值大,一部分比中值小.这需要数据集按顺序完成清洗.数据必须是排序的,但存在一定障碍,因为MapReduce不会根据values排序. 方差告诉我们数据跟平均值之间的差异程度.这就要求我们之前要先找到平均值.执行这种操作最容易的方法是复制值得列表到临时列表,以便找到中值,或者再一次迭代

MapReduce Design Patterns(chapter 2(part 1))(二)

随着每天都有更多的数据加载进系统,数据量变得很庞大.这一章专注于对你的数据顶层的,概括性意见的设计模式,从而使你能扩展思路,但可能对局部数据是不适用的.概括性的分析都是关于对相似数据的分组和执行统计运算,创建索引,或仅仅为了计数. 通过分组数据集计算聚合排序是一种快速获取结果的好方法.例如,你可能想按某种规则计算出所存的钱的总数,或者按人口计算人们在互联网花费的平均时长.对于新的数据集,你可以开始用这些分析类型帮你计算出数据中什么东西有趣或唯一,和哪些需要仔细研究. 本章的模式有数值聚合,反向索

设计模式(Design Patterns)

设计模式(Design Patterns) 设计模式(Design pattern)是一套被反复使用.多数人知晓的.经过分类编目的.代码设计经验的总结.使用设计模式是为了可重用代码.让代码更容易被他人理解.保证代码可靠性. 毫无疑问,设计模式于己于他人于系统都是多赢的,设计模式使代码编制真正工程化,设计模式是软件工程的基石,如同大厦的一块块砖石一样.项目中合理的运用设计模式可以完美的解决很多问题,每种模式在现在中都有相应的原理来与之对应,每一个模式描述了一个在我们周围不断重复发生的问题,以及该问

Design Patterns 5 原型模式 Prototype

原型模式 Prototype 原型模式:用原型实例指定创建对象的种类,并且通过拷贝这个原型来创建新的对象. 当我们需要多个相同的类实例时,没必要每次都使用new运算符去创建相同的类实例对象,我们可以用原型模式减少内存的消耗和达到类实例的复用. //带有返回自身接口的抽象原型类 public abstract class Prototype5 { public string Id { get; set; } public Prototype5(string id) { this.Id = id;

Cocoa Design Patterns

Book Description This is the Rough Cut version of the printed book. Much of the technology embodied by Apple's Cocoa software development frameworks have been in commercial use since 1988, and in spite of many years of use, the Cocoa frameworks are s

Learning JavaScript Design Patterns The Observer Pattern

The Observer Pattern The Observer is a design pattern where an object (known as a subject) maintains a list of objects depending on it (observers), automatically notifying them of any changes to state. When a subject needs to notify observers about s

[Design Patterns] 4. Creation Pattern

设计模式是一套被反复使用.多数人知晓的.经过分类编目的.代码设计经验的总结,使用设计模式的目的是提高代码的可重用性,让代码更容易被他人理解,并保证代码可靠性.它是代码编制真正实现工程化. 四个关键元素:(1) Pattern Name, (2) Problem, (3) Solution, (4) Consequences. 01. Factory Method Pattern /* The product should be created by his own factory. */ Log

Learning JavaScript Design Patterns The Module Pattern

The Module Pattern Modules Modules are an integral piece of any robust application's architecture and typically help in keeping the units of code for a project both cleanly separated and organized. In JavaScript, there are several options for impleme

Design Patterns 4 酒干倘卖无---抽象工厂模式AbstractFactory

抽象工厂模式AbstractFactory 抽象工厂模式:提供一个创建产品的接口来负责创建相关或依赖的对象,而不具体明确指定具体类. 抽象工厂对于系列产品的变化支持 “开放——封闭”原则(指的是要求系统对扩展开放,对修改封闭),扩展起来非常简便,但对于添加新产品这种情况就不支持”开放——封闭 “原则. Design Patterns 4 酒干倘卖无---抽象工厂模式AbstractFactory,布布扣,bubuko.com