Lambda Architecture: Achieving Velocity and Volume with Big Data

http://www.semantikoz.com/blog/lambda-architecture-velocity-volume-big-data-hadoop-storm/

Big data architecture paradigms are commonly separated into two (supposedly) diametrical models, the more traditional batch and the (near) real-time processing. The most popular technologies representing the two are Hadoop with MapReduce and Storm. However, a hybrid solution, the Lambda Architecture, challenges the idea that these approaches have to exclude each other. The Lambda Architecture combines a slow and fast lane of data processing to achieve the best of both worlds. Fast results and deep, large scale processing.

Usually one or the other architecture has been implemented due to a business requirement. Commonly, business users or customers eventually arrive at the point where they either would like to get a more historic view or more real time insight either of which can not be provided by the deployed architecture. At this point a hybrid solution becomes the only realistic solution. One which brings some surprising benefits with it.

Lambda Architecture explained

The Lambda Architecture centrally receives data and does as little as possible processing before copying and splitting the data stream to the real time and batch layer. The batch layer collects the data in a data sink like HDFS or S3 in its raw form. Hadoop jobs regularly process the data and write the result to a data store.

Lambda architecture duplicates incoming data and processes them in parallel at different speeds

Since this process is fully batched the data store can have some
significant simplification. It should support random reads, i.e. needs
some kind of index, however, it can do away with random writing,
locking, and consistency issues. This simplifies the store
significantly. An example of such a system is ElephantDB.

The problem with batch processing is the time it takes. For example,
the above process may take hours or days. In the meantime data has been
arriving and subsequent processes or services continue to work with
hours or days old information. The real time layer solves this by taking
its copy of the data and processing it in seconds or minutes and stores
it in a fast random read and write store. This store is more complex
since it has to be constantly updated.

The complexity of the real time layer and it’s store is manageable
since it only has to store and serve a sliding window of data, which
needs to be roughly as long as the batch process takes. Both layers’
results are merged and real time information is replaced in favour of
batch layer data. In many cases this enables for the real time process
to work with good approximations since its results are replaced by
highly precise data within a short period.

Lambda Architecture benefits

The addition of another layer to an architecture has major
advantages. Firstly, data can (historically) be processed with high
precision and involved algorithms without losing short-term information,
alerts, and insights provided by the real time layer. Secondly, the
addition of a layer is offset by dramatically reducing the random write
storage requirements. The batch write storage provides also the option
to switch data at predefined times and version data.

Lastly and importantly, the addition of the data sink of raw data
offers the option to recover from human mistakes, i.e. deploying bugs
which write erroneous aggregated data from which other architectures can
not recover. Another option is to retrospectively enhance data
extraction or learning algorithms and apply them on the whole of the
historic dataset. This is extremely helpful in agile and startup
environments where MVPs push what can be done down the track.

原文地址:https://www.cnblogs.com/dadadechengzi/p/12639176.html

时间: 2024-08-29 22:08:22

Lambda Architecture: Achieving Velocity and Volume with Big Data的相关文章

Lambda architecture and Kappa architecture

https://blog.csdn.net/hjw199089/article/details/84713095 Lambda architecture and kappa architecture. From Mastering Azure Analytics by Zoiner Tejada Getting Started with Kudu Lambda Architecture Lambda architecture was originally proposed by the crea

数据系统架构——Lambda architecture(Lambda架构)

传统系统的问题 "我们正在从IT时代走向DT时代(数据时代).IT和DT之间,不仅仅是技术的变革,更是思想意识的变革,IT主要是为自我服务,用来更好地自我控制和管理,DT则是激活生产力,让别人活得比你好"--阿里巴巴董事局主席马云. 数据量从M的级别到G的级别到现在T的级.P的级别.数据量的变化数据管理系统(DBMS)和数仓系统(DW)也在悄然的变化着. 传统应用的数据系统架构设计时,应用直接访问数据库系统.当用户访问量增加时,数据库无法支撑日益增长的用户请求的负载时,从而导致数据库服

Kappa Architecture: A Different Way to Process Data

https://www.blue-granite.com/blog/a-different-way-to-process-data-kappa-architecture Kappa architecture proposes an immutable data stream as the primary source of record. Unlike lambda, kappa mitigates the need to replicate code in multiple services.

Questioning the lambda architecure

https://www.oreilly.com/radar/questioning-the-lambda-architecture/ What is a Lambda Architecture and how do I become one? The Lambda Architecture looks something like this: The way this works is that an immutable sequence of records is captured and f

红帽存储管理2——volume类型与创建

红帽存储管理2--volume类型与创建 三.volume的管理 红帽存储服务器管理的对象主要就是volume,volume是brick逻辑上的集合,这种集合的方法也有多种,不同的集合方式代表不同的volume类型,主要有七大类型 1.volume类型 Distributed 分布式:将文件平均分配到不同的bricks中(以文件个数平均分配),如果一个volume只包含1个brick,也叫做distributed volume,所以distributed volume至少要包含1个brick R

k8s1.5.4挂载volume之glusterfs

volume的例子集合 https://github.com/kubernetes/kubernetes/tree/master/examples/volumes http://www.dockerinfo.net/2926.html http://dockone.io/article/2087 https://www.kubernetes.org.cn/1146.html https://kubernetes.io/docs/user-guide/volumes/ k8s集群安装部署 http

在Glusterfs上创建distributed volume,replicated volume,dispersed volume,combined volume

前面一篇写到了在CentOS上如何安装glusterfs,以及简单创建了一个volume并实现了native-mount,今天我们重点看一下在glusterfs上都可以创建哪种类型的volume. 1. 首先还是先介绍下实验环境,今天共用到了5台虚拟机,其中4个虚拟机做server端,分别是: servera.lab.example.com serverb.lab.example.com serverc.lab.example.com serverd.lab.example.com 1个虚拟机做

An In-Depth Look at the HBase Architecture

https://www.mapr.com/blog/in-depth-look-hbase-architecture An In-Depth Look at the HBase Architecture August 7, 2015 Carol McDonald In this blog post, I’ll give you an in-depth look at the HBase architecture and its main benefits over NoSQL data stor

C++拾遗--lambda表达式

C++拾遗--lambda表达式 前言 有时,我们需要在函数内部频繁地使用某一功能.此时,我们可以把这种功能写成一个独立的函数.而实际上,这个新的函数很可能是不需要在其它的地方进行调用的.我们想限定它的作用范围,最好是仅限于当前函数.而函数的内部是不可以重新定义其它的函数的.为了解决这个问题,在新的标准中,C++引入了lambda表达式(lambda expression)的概念.有了lambda表达式,C++向一门完美的语言又进了一大步.总的来说,lambda表达式极大地提升了C++的函数运用