Storm(2) - Log Stream Processing

Introduction

This chapter will present an implementation recipe for an enterprise log storage and a search and analysis solution based on the Storm processor. Log data processing isn‘t necessarily a problem that needs solving again; it is, however, a good analogy.

Stream processing is a key architectural concern in the modern enterprise; however, streams of data are often semi-structured at best. By presenting an approach to enterprise log processing, this chapter is designed to provide the reader with all the key elements to achieve this level of capability on any kind of data. Log data is also extremely convenient in an academic setting given its sheer abundance. A key success factor for any stream processing or analytics effort is a deep understanding of the actual data and sourcing data can often be difficult.

It is, therefore, important that the reader considers how the architectural blueprint could be applied to other forms of data within the enterprise.

The following diagram illustrates all the elements that we will develop in this chapter:

You will learn how to create a log agent that can be distributed across all the nodes in your environment. You will also learn to collect these log entries centrally using Storm and Redis, and then analyze, index, and count the logs, such that we will be able to search them later and display base statistics for them.

Creating a log agent

1. download and config logstash to steam local node log into the topology

wget https://logstash.objects.dreamhost.com/release/logstash-1.1.7-monolithic.jar 

2. create the file of shipper.conf

input {
    file {
        type => "syslog"
        path => ["/var/log/messages", "/var/log/system.*", "/var/log/*.log"]
    }
}

output {
    #output events to stdout for debugging. feel free to remove it
    stdout {
    }

    redis {
        host => "localhost"
        data_type => "list"
        key => "rawLogs"
    }
}

3. start a local instance of Redis, and then start logstash 

java -jar logstash-1.1.7-monolithic.jar -f shipper.conf

时间: 2024-10-23 08:27:07

Storm(2) - Log Stream Processing的相关文章

13 Stream Processing Patterns for building Streaming and Realtime Applications

原文:https://iwringer.wordpress.com/2015/08/03/patterns-for-streaming-realtime-analytics/ Introduction More and more use cases, we want to react to data faster, rather than storing them in a disk and periodically processing and acting on the data. This

Stream Processing 101: From SQL to Streaming SQL in 10 Minutes

原文:https://wso2.com/library/articles/2018/02/stream-processing-101-from-sql-to-streaming-sql-in-ten-minutes/ We have entered an era where competitive advantage comes from analyzing, understanding, and responding to an organization’s data. When doing

Akka(23): Stream:自定义流构件功能-Custom defined stream processing stages

从总体上看:akka-stream是由数据源头Source,流通节点Flow和数据流终点Sink三个框架性的流构件(stream components)组成的.这其中:Source和Sink是stream的两个独立端点,而Flow处于stream Source和Sink中间可能由多个通道式的节点组成,每个节点代表某些数据流元素转化处理功能,它们的链接顺序则可能代表整体作业的流程.一个完整的数据流(可运行数据流)必须是一个闭合的数据流,即:从外表上看,数据流两头必须连接一个Source和一个Sin

MillWheel: Fault-Tolerant Stream Processing at Internet Scale

http://static.googleusercontent.com/media/research.google.com/zh-CN//pubs/archive/41378.pdf   为什么要做MillWheel? 因为当前的其他的流式系统,无法同时满足 fault tolerance, versatility, and scalability 的需求. Spark Streaming [34] and Sonora [32] do excellent jobs of efficient c

[RxJS] Stream Processing With RxJS vs Array Higher-Order Functions

Higher order Array functions such as filter, map and reduce are great for functional programming, but they can incur performance problems. var ary = [1,2,3,4,5,6]; var res = ary.filter(function(x, i, arr){ console.log("filter: " + x); console.lo

Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters

阅读笔记 概述: 本文同样发表于2012年.提出了一种称为离散化数据流(Discretized Streams,D-Streams)的编程模型. 该模型提供了一种高级函数式API,具有高度的一致性和强大的容错能力. 基于Spark分布式计算框架,进行扩展实现了一个D-Stream的原型,称为Spark Streaming. 研究背景: 许多大数据应用要求实现实时响应. 社交网络应用需要在几分钟内分析出当前的热点话题. 广告提供商需要针对用户对广告的点击行为进行建模并训练. 服务器管理者需要在几秒

Questioning the lambda architecure

https://www.oreilly.com/radar/questioning-the-lambda-architecture/ What is a Lambda Architecture and how do I become one? The Lambda Architecture looks something like this: The way this works is that an immutable sequence of records is captured and f

Storm实战常见问题及解决方案

文档说明 该文档包涵了storm实战中经常遇到一些问题,及对应解决方案.这个文档是群里一个朋友在学习storm,并实战storm中遇到的一些问题,及和群里其他朋友一起交流给出的对应解决方案,并由他整理好,委托我发布出来(也算是交流者之一),供大家参考,希望能对大家有所帮助. 感谢 某某(哈哈 鉴于部分原因,不便透露名字~~~~!)… 问题锦集 1 关于Storm集群 1.1 关于storm集群的环境变量配置问题 安装好JDK后,需要配置环境变量,通常情况下出于经验,我们往往会修改/etc/pro

Storm笔记——技术点汇总

目录 · 概述 · 手工搭建集群 · 引言 · 安装Python · 配置文件 · 启动与测试 · 应用部署 · 参数配置 · Storm命令 · 原理 · Storm架构 · Storm组件 · Stream Grouping · 守护进程容错性(Daemon Fault Tolerance) · 数据可靠性(Guaranteeing Message Processing) · 消息传输机制 · API · WordCount示例 · 应用部署方式 · 组件接口 · 组件实现类 · 数据连接方