0%

ElasticSearch监控利器ElastAlert的使用指南

ElastAlert介绍

ElastAlert is a simple framework for alerting on anomalies, spikes, or other patterns of interest from data in Elasticsearch.

简而言之,就是一款可以用于监控告警的框架,依据的是不断轮训ES,查询出数据,在满足了自己配置的一些规则之后进行响应的后续操作,比如发邮件等。

有以下特点:

  • 简单
  • 文档齐全
  • 社区活跃

支持的告警类型有:

  • 命令行
  • 右键
  • JIRA
  • SNS
  • 等等

安装

环境

  • Linux CentOS
  • ElasticSearch 5.4.0
  • Kibana 5.4.0
1
2
pip install elastalert
pip install "elasticsearch=5.4.0"

配置

初始化ElastAlert index

输入elastalert-create-index,填写ES host和port,后面的可直接回车。

1
2
3
4
5
6
7
8
9
10
11
12
13
elastalert-create-index
Enter Elasticsearch host: xxx
Enter Elasticsearch port: 9200
Use SSL? t/f: f
Enter optional basic-auth username (or leave blank):
Enter optional basic-auth password (or leave blank):
Enter optional Elasticsearch URL prefix (prepends a
string to the URL of every request):
New index name? (Default elastalert_status)
Name of existing index to copy? (Default None)
Elastic Version:5
Mapping used for string:{'index': 'not_analyzed', 'type': 'string'}
Index elastalert_status already exists. Skipping index creation.

修改config.yaml

  • 下载
1
2
3
4
# 通过pip安装,需要自行前往git上下载示例配置文件
wget https://raw.githubusercontent.com/Yelp/elastalert/master/config.yaml.example
# 改名字为 config.yaml
mv config.yaml.example config.yaml
  • 修改配置

修改 es_host 和 es_port 属性,其他保持默认即可

image

配置规则

规则类型

  • any: 只要有匹配就报警;
  • blacklist: compare_key 字段的内容匹配上 blacklist 数组里任意内容;
  • whitelist: compare_key 字段的内容一个都没能匹配上 whitelist 数组里内容;
  • change: 在相同 query_key 条件下,compare_key 字段的内容,在 timeframe 范围内发送变化;
  • frequency: 在相同 query_key 条件下,timeframe 范围内有 num_events 个被过滤出来的异常;
  • spike: 在相同 query_key 条件下,前后两个 timeframe 范围内数据量相差比例超过 spike_height。其中可以通过 spike_type 设置具体涨跌方向是up, down, both。还可以通过threshold_ref 设置要求上一个周期数据量的下限,threshold_cur 设置要求当前周期数据量的下限,如果数据量不到下限,也不触发;
  • flatline: timeframe 范围内,数据量小于 threshold 阈值;
  • new_term: fields 字段新出现之前 terms_window_size(默认 30 天) 范围内最多的 terms_size(默认 50) 个结果以外的数据;
  • cardinality: 在相同 query_key 条件下,timeframe 范围内 cardinality_field 的值超过 max_cardinality 或者低于 min_cardinality。

在这里举一个大数据项目中经常用的一个例子,数据服务查询超时限次预警,即在指定的时间内,查询超时的次数高于一定值后报警

新建 example_rules 目录,新建一个query_timeout_frequency.yaml,具体配置如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
# Alert when the rate of events exceeds a threshold

# (Optional)
# Elasticsearch host
es_host: xxx

# (Optional)
# Elasticsearch port
es_port: 9200

# (OptionaL) Connect with SSL to Elasticsearch
#use_ssl: True

# (Optional) basic-auth username and password for Elasticsearch
#es_username: someusername
#es_password: somepassword

# (Required)
# Rule name, must be unique
name: query timeout

query_key:
- name

realert:
minutes: 5

# (Required)
# Type of alert.
# the frequency rule type alerts when num_events events occur with timeframe time
type: frequency

# (Required)
# Index to search, wildcard supported
index: dataservice-custom-api-log*

# (Required, frequency specific)
# Alert when this many documents matching the query occur within a timeframe
num_events: 10

# (Required, frequency specific)
# num_events must occur within this amount of time to trigger an alert
timeframe:
#hours: 4
minutes: 1

# (Required)
# A list of Elasticsearch filters used for find events
# These filters are joined with AND and nested in a filtered query
# For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html

#filter:
#- term:
# some_field: "some_value"

filter:
- query_string:
query: "logtextJson.totalUsed:>5000
AND -host:(zhike1 OR zhike2 OR zhike3)"

smtp_host: smtp.exmail.qq.com
smtp_port: 465
smtp_ssl: true

smtp_auth_file: /data2/elastalert/config/smtp_auth_file.yaml
#回复给那个邮箱
email_reply_to: xxx@xxx.com
#从哪个邮箱发送
from_addr: xxx@xxx.com


# (Required)
# The alert is use when a match is found
alert:
- "email"

# (required, email specific)
# a list of email addresses to send alerts to
email:
- "xxx@xxx.com"

alert_subject: "大数据集群查询超时次数超限,匹配到了{}条日志,匹配{}次"
alert_subject_args:
- num_hits
- num_matches

alert_text_type: alert_text_only

alert_text: |
您好,大数据主集群查询超时次数超限,请检查服务器状态!
> 截止发邮件前匹配到的请求数:{}
> 截止发邮件前匹配到的次数:{}
> 发生时间: {}
> timestamp:{}
> remoteip: {}
> request: {}
> loglevel:{}
> 日志来源:{}

alert_text_args:
- num_hits
- num_matches
- logtextJson.out
- "@timestamp"
- logtextJson.requestIp
- logtextJson.requestURI
- loglevel
- source

smtp_auth_file.yaml

1
2
user: xxx@xxx.com
password: xxx

重要配置解释

  • alert_text 邮件html内容
  • alert_text_args 传入的参数,

参考配置
Rule Types and Configuration Options — ElastAlert 0.0.1 documentation

测试规则

image

启动

1
nohup elastalert --config config.yaml --rule rules/query_timeout_frequency.yaml  >nohup.out 2>&1 &

遇到的问题

打印的时间时区不对,比北京时区晚八个小时

这个需要修改logstash 的 date filter的timezone为 北京时区

报错

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Traceback (most recent call last):
File "/usr/local/anaconda2/bin/elastalert", line 11, in <module>
sys.exit(main())
File "/usr/local/anaconda2/lib/python2.7/site-packages/elastalert/elastalert.py", line 1925, in main
client.start()
File "/usr/local/anaconda2/lib/python2.7/site-packages/elastalert/elastalert.py", line 1106, in start
self.run_all_rules()
File "/usr/local/anaconda2/lib/python2.7/site-packages/elastalert/elastalert.py", line 1158, in run_all_rules
self.send_pending_alerts()
File "/usr/local/anaconda2/lib/python2.7/site-packages/elastalert/elastalert.py", line 1534, in send_pending_alerts
pending_alerts = self.find_recent_pending_alerts(self.alert_time_limit)
File "/usr/local/anaconda2/lib/python2.7/site-packages/elastalert/elastalert.py", line 1526, in find_recent_pending_alerts
size=1000)
File "/usr/local/anaconda2/lib/python2.7/site-packages/elasticsearch/client/utils.py", line 84, in _wrapped
return func(*args, params=params, **kwargs)
TypeError: search() got an unexpected keyword argument 'doc_type'

版本不匹配

查看版本

1
2
3
4
5
pip freeze | grep elas
You are using pip version 9.0.1, however version 19.1.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
elastalert==0.1.39
elasticsearch==7.0.1

发现版本为7+,而我们es集群的版本为5.4.0,所以卸载重装

1
2
pip uninstall elasticsearch
pip install elasticsearch==5.4.0

发送邮件报错

1
ERROR:root:Error while running alert email: Error connecting to SMTP host: Connection unexpectedly closed

因为我们的事启用了ssl加密传输的,所以需要加以下配置

1
smtp_ssl: true

注意:文中xxx需要修改为您的环境的配置。