装完elk跑起来之后,我的内心几乎是崩溃的,16G内存16核cpu还经常报错。
一、logstash和elasticsearch同时报错
logstash出现大量报错,可能是es占用heap太多,没有优化es导致的
retrying failed action with response code: 503 {:level=>:warn}
too many attempts at sending event. dropping: 2016-06-16T05:44:54.464Z %{host} %{message} {:level=>:error}
elasticsearch出现大量报错
too many open files
是这个值太小了"max_file_descriptors" : 2048,
# curl http://localhost:9200/_nodes/process\?pretty
{
"cluster_name" : "elasticsearch",
"nodes" : {
"ZLgPzMqBRoyDFvxoy27Lfg" : {
"name" : "Mass Master",
"transport_address" : "inet[/192.168.153.200:9301]",
"host" : "localhost",
"ip" : "127.0.0.1",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[/192.168.153.200:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 943,
"max_file_descriptors" : 2048,
"mlockall" : true
解决办法:
设置文件打开数
# ulimit -n 65535
设置开机自启动
# vi /etc/profile
在es启动文件里面添加,然后重新启动elasticsearch
# vi /home/elk/elasticsearch-1.6.0/bin/elasticsearch
ulimit -n 65535
# curl http://localhost:9200/_nodes/process\?pretty
{
"cluster_name" : "elasticsearch",
"nodes" : {
"_QXVsjL9QOGMD13Eb6t7Ag" : {
"name" : "Ocean",
"transport_address" : "inet[/192.168.153.200:9301]",
"host" : "localhost",
"ip" : "127.0.0.1",
"version" : "1.6.0",
"build" : "cdd3ac4",
"http_address" : "inet[/192.168.153.200:9200]",
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 1693,
"max_file_descriptors" : 65535,
"mlockall" : true
}
}
二、out of memory内存溢出
优化后的es配置文件内容:
# egrep -v ‘^$|^#‘ /home/elk/elasticsearch-1.6.0/config/elasticsearch.yml
bootstrap.mlockall: true
http.max_content_length: 2000mb
http.compression: true
index.cache.field.type: soft
index.cache.field.max_size: 50000
index.cache.field.expire: 10m
针对bootstrap.mlockall: true还要设置
# ulimit -l unlimited
# vi /etc/sysctl.conf
vm.max_map_count=262144
vm.swappiness = 1
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 127447
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 127447
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
# vi /etc/security/limits.d/90-nproc.conf
* soft nproc 320000
root soft nproc unlimited
三、es状态是yellow
es中用三种颜色状态表示:green,yellow,red.
green:所有主分片和副本分片都可用
yellow:所有主分片可用,但不是所有副本分片都可用
red:不是所有的主分片都可用
# curl -XGET http://localhost:9200/_cluster/health\?pretty
{
"cluster_name" : "elasticsearch",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 161,
"active_shards" : 161,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 161,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0
解决办法:建立elasticsearch集群(下篇博客写)
四、kibana not indexed错误
https://rafaelmt.net/en/2015/09/01/kibana-tutorial/#refresh-fields
kibana的索引根据事件会经常更新,所以kibana图有时候会出现 not indexed的报错:
解决办法:
我们访问kibana,然后选择settings,点击indices,点击logstash-*。点击刷新的图标就ok了