Introduction
This chapter will present an implementation recipe for an enterprise log storage and a search and analysis solution based on the Storm processor. Log data processing isn‘t necessarily a problem that needs solving again; it is, however, a good analogy.
Stream processing is a key architectural concern in the modern enterprise; however, streams of data are often semi-structured at best. By presenting an approach to enterprise log processing, this chapter is designed to provide the reader with all the key elements to achieve this level of capability on any kind of data. Log data is also extremely convenient in an academic setting given its sheer abundance. A key success factor for any stream processing or analytics effort is a deep understanding of the actual data and sourcing data can often be difficult.
It is, therefore, important that the reader considers how the architectural blueprint could be applied to other forms of data within the enterprise.
The following diagram illustrates all the elements that we will develop in this chapter:
You will learn how to create a log agent that can be distributed across all the nodes in your environment. You will also learn to collect these log entries centrally using Storm and Redis, and then analyze, index, and count the logs, such that we will be able to search them later and display base statistics for them.
Creating a log agent
1. download and config logstash to steam local node log into the topology wget https://logstash.objects.dreamhost.com/release/logstash-1.1.7-monolithic.jar 2. create the file of shipper.conf input { file { type => "syslog" path => ["/var/log/messages", "/var/log/system.*", "/var/log/*.log"] } } output { #output events to stdout for debugging. feel free to remove it stdout { } redis { host => "localhost" data_type => "list" key => "rawLogs" } } 3. start a local instance of Redis, and then start logstash java -jar logstash-1.1.7-monolithic.jar -f shipper.conf