在"hello World" 示例中,我们已经见到并介绍了Logstash 的运行流程和配置的基础语法。 请记住一个原则:Logstash 配置一定要有一个 input 和一个 output。在演示过程中,如果没有写明 input,默认就会使用 "hello world" 里我们已经演示过的 input/stdin ,同理,没有写明的 output 就是 output/stdout
如果有什么问题的话,请查看该文档:http://udn.yyuap.com/doc/logstash-best-practice-cn/input/index.html。以下是input插件的具体解释:
(1),标准输入。type和tags是logstash事件中特殊的字段。 type 用来标记事件类型 —— 我们肯定是提前能知道这个事件属于什么类型的。而 tags 则是在数据处理过程中,由具体的插件来添加或者删除的。
[[email protected] test]# vim stdin.conf input { stdin { add_field => {"key" => "value"} codec => "plain" tags => ["add"] type => "std-lqb" } } output { stdout { codec => rubydebug } } [[email protected] logstash]# /usr/local/logstash/bin/logstash -f test/stdin.conf Settings: Default pipeline workers: 1 Logstash startup completed hello world { "message" => "hello world", "@version" => "1", "@timestamp" => "2017-05-24T08:11:45.852Z", "type" => "std-lqb", "key" => "value", "tags" => [ [0] "add" ], "host" => "localhost.localdomain" } abclqb { "message" => "abclqb", "@version" => "1", "@timestamp" => "2017-05-24T08:13:21.192Z", "type" => "std-lqb", "key" => "value", "tags" => [ [0] "add" ], "host" => "localhost.localdomain" } #####对stdin进行修改,添加tags列 [[email protected] test]# vim stdin.conf input { stdin { add_field => {"key" => "value2222222222222222222222222222222222222222222 2"} codec => "plain" tags => ["add","xxyy","abc"] type => "std-lqb" } } output { stdout { codec => rubydebug } } [[email protected] logstash]# /usr/local/logstash/bin/logstash -f test/stdin.conf Settings: Default pipeline workers: 1 Logstash startup completed hello world { "message" => "hello world", "@version" => "1", "@timestamp" => "2017-05-24T09:07:43.228Z", "type" => "std-lqb", "key" => "value22222222222222222222222222222222222222222222", "tags" => [ [0] "add", [1] "xxyy", [2] "abc" ], "host" => "localhost.localdomain" } #########根据tags来进行判断: [[email protected] test]# vim stdin_2.conf input { stdin { add_field =>{"key11"=>"value22"} codec=>"plain" tags=>["add","xxyy"] type=>"std" } } output { if "tttt" in [tags]{ stdout { codec=>rubydebug{} } } else if "add" in [tags]{ stdout { codec=>json } } } [[email protected] logstash]# /usr/local/logstash/bin/logstash -f test/stdin_2.con f Settings: Default pipeline workers: 1 Logstash startup completed yyxxx {"message":"yyxxx","@version":"1","@timestamp":"2017-05-24T09:32:25.840Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"} {"message":"","@version":"1","@timestamp":"2017-05-24T09:32:32.480Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}xxyy {"message":"xxyy","@version":"1","@timestamp":"2017-05-24T09:32:42.249Z","type":"std","key11":"value22","tags":["add","xxyy"],"host":"localhost.localdomain"}
(2).读取文件。Logstash 使用一个名叫 FileWatch 的 Ruby Gem 库来监听文件变化。这个库支持 glob 展开文件路径,而且会记录一个叫 .sincedb 的数据库文件来跟踪被监听的日志文件的当前读取位置。所以,不要担心 logstash 会漏过你的数据.
[[email protected] test]# cat log.conf input { file { path =>"/usr/local/nginx/logs/access.log" type=>"system" start_position =>"beginning" } } output { stdout { codec => rubydebug } } [[email protected] logstash]# /usr/local/logstash/bin/logstash -f test/log.conf Settings: Default pipeline workers: 1 Logstash startup completed { "message" => "192.168.181.231 - - [24/May/2017:15:04:29 +0800] \"GET / HTTP/1.1\" 502 537 \"-\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\" \"-\"", "@version" => "1", "@timestamp" => "2017-05-24T09:39:16.600Z", "path" => "/usr/local/nginx/logs/access.log", "host" => "localhost.localdomain", "type" => "system" } { "message" => "192.168.181.231 - - [24/May/2017:15:04:32 +0800] \"GET / HTTP/1.1\" 502 537 \"-\" \"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36\" \"-\"", "@version" => "1", "@timestamp" => "2017-05-24T09:39:16.614Z", "path" => "/usr/local/nginx/logs/access.log", "host" => "localhost.localdomain", "type" => "system" }
解释:
有一些比较有用的配置项,可以用来指定 FileWatch 库的行为:
- discover_interval
logstash 每隔多久去检查一次被监听的 path
下是否有新文件。默认值是 15 秒。
- exclude
不想被监听的文件可以排除出去,这里跟 path
一样支持 glob 展开。
- sincedb_path
如果你不想用默认的 $HOME/.sincedb
(Windows 平台上在 C:\Windows\System32\config\systemprofile\.sincedb
),可以通过这个配置定义 sincedb 文件到其他位置。
- sincedb_write_interval
logstash 每隔多久写一次 sincedb 文件,默认是 15 秒。
- stat_interval
logstash 每隔多久检查一次被监听文件状态(是否有更新),默认是 1 秒。
- start_position
logstash 从什么位置开始读取文件数据,默认是结束位置,也就是说 logstash 进程会以类似 tail -F
的形式运行。如果你是要导入原有数据,把这个设定改成 "beginning",logstash 进程就从头开始读取,有点类似 cat
,但是读到最后一行不会终止,而是继续变成 tail -F
。
注意
- 通常你要导入原有数据进 Elasticsearch 的话,你还需要 filter/date 插件来修改默认的"@timestamp" 字段值。稍后会学习这方面的知识。
- FileWatch 只支持文件的绝对路径,而且会不自动递归目录。所以有需要的话,请用数组方式都写明具体哪些文件。
- LogStash::Inputs::File 只是在进程运行的注册阶段初始化一个 FileWatch 对象。所以它不能支持类似 fluentd 那样的
path => "/path/to/%{+yyyy/MM/dd/hh}.log"
写法。达到相同目的,你只能写成path => "/path/to/*/*/*/*.log"
。 start_position
仅在该文件从未被监听过的时候起作用。如果 sincedb 文件中已经有这个文件的 inode 记录了,那么 logstash 依然会从记录过的 pos 开始读取数据。所以重复测试的时候每回需要删除 sincedb 文件。- 因为 windows 平台上没有 inode 的概念,Logstash 某些版本在 windows 平台上监听文件不是很靠谱。windows 平台上,推荐考虑使用 nxlog 作为收集端
(3).TCP输入。未来你可能会用 Redis 服务器或者其他的消息队列系统来作为 logstash broker 的角色。不过 Logstash 其实也有自己的 TCP/UDP 插件,在临时任务的时候,也算能用,尤其是测试环境。
[[email protected] test]# cat tcp.conf input { tcp { port =>8888 mode=>"server" ssl_enable =>false } } output { stdout { codec => rubydebug } } [[email protected] logstash]# /usr/local/logstash/bin/logstash -f test/tcp.conf Settings: Default pipeline workers: 1 Logstash startup completed { "message" => "GET /jenkins/ HTTP/1.1\r", "@version" => "1", "@timestamp" => "2017-05-24T10:09:53.980Z", "host" => "192.168.181.231", "port" => 59426 } { "message" => "Host: 192.168.180.9:8888\r", "@version" => "1", "@timestamp" => "2017-05-24T10:09:54.175Z", "host" => "192.168.181.231", "port" => 59426 } { "message" => "Connection: keep-alive\r", "@version" => "1", "@timestamp" => "2017-05-24T10:09:54.180Z", "host" => "192.168.181.231", "port" => 59426 }
备注:先关闭8888端口的应用,再开启,会输出如下日志。