PMM是Percona开源的监控系统(Percona Monitoring and Management),主要用于监控MySQL、MongoDB等数据库的性能指标,目前最新版本是1.11.0。工作原因对其进行了一段时间的调研,本文将介绍如何在生产环境使用PMM快速构建MySQL数据库监控系统,包含基本原理、部署和优缺点比较等。

Architecture

关于PMM的架构,最主要是理解它是个基于Prometheus的系统,PMM的一些组件相当于对Prometheus中各个模块的封装,架构图如下:

PMM Architecture Overview

结构分成了client-server部分:

  • Client:包含了各种exporter的daemon,以及用于管理他们的pmm-admin命令行工具
  • Server - Metrics Monitor:本地存储的Prometheus、consul用于服务发现、Grafana出图;这块都是Prometheus技术栈
  • Server - Query Analytics:Percona自研的查询分析组件,用于分析在DB端收集的慢查询,存储于MySQL中,最终以新的datasource注册至Grafana出图展示
  • Server - Orchestrator:第三方工具orchestrator,图形化展示和管理MySQL实例的复制拓扑关系

Deployment

PMM的部署相对Prometheus简单不少,client和server端基本都是一键部署,维护起来也相对容易。

Prerequisite

系统版本CentOS7,我这里使用yum安装。建议在server和client端都添加repository:

1
2
3
4
5
6
7
sudo yum install http://www.percona.com/downloads/percona-release/redhat/0.1-4/percona-release-0.1-4.noarch.rpm
sudo yum install -y percona-toolkit
# check by yum search
sudo yum search percona | grep -i pmm-
Repodata is over 2 weeks old. Install yum-cron? Or run: yum makecache fast
pmm-client.x86_64 : Percona Monitoring and Management Client

Privilege

被监控的MySQL需要为pmm-client开启对应的user权限,pmm-admin工具提供--create-user选项自动创建符合要求的帐号,亦可以自行创建用户:

1
2
3
4
-- SUPER is necessary when running QAN agent for MySQL option modification
-- if you're using orchestrator,replication client/slave are needed as well
GRANT SELECT, PROCESS, SUPER, REPLICATION CLIENT, RELOAD ON *.* TO 'pmm'@'localhost' IDENTIFIED BY 'xxx' WITH MAX_USER_CONNECTIONS 10;
GRANT SELECT, UPDATE, DELETE, DROP ON performance_schema.* TO 'pmm'@'localhost';

Accessibility

client与server端需要保持网络连通,否则无法完成监控数据推送和健康检查。与Prometheus相同,在不使用push-gateway的情况下,需要client与server双向互通:

Client => Serve(80)

client端使用server端Consul API, Query Analytics API, Prometheus API推送监控数据,因此这个server的80/443端口必须对外开放

Client(42000,42002…) <= Server

server端需要按照一定的周期抓取client端exporter状态,比如up监控项

后续可以使用pmm-admin check-network命令进行网络连通性检查。

Launching Server

PMM Server提供三种方式安装,Docker、虚机实例、EC2 Image,这里说明使用Docker的安装方法,以为它更加通用和简单。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# install docker-engine if you didn't have
sudo yum install docker
sudo systemctl start docker
# pull pmm image
docker pull percona/pmm-server:latest
# create a container for persistent PMM data
docker create \
-v /opt/prometheus/data \ # directories inside container
-v /opt/consul-data \
-v /opt/mysql \
-v /opt/grafana \
--name pmm-data \
percona/pmm-server:latest /bin/true
# create and launch the PMM Server container
docker run -d \
-p 80:80 \
--volumes-from pmm-data \
--name pmm-server \
--restart always \
-e METRICS_RESOLUTION=5s \ # collects metrics with minimum 5s resolutions
-e METRICS_RETENTION=600h \ # stores time-series data for 600 hours
-e QUERIES_RETENTION=7 \ # stores queries date for 7 days
-e ORCHESTRATOR_ENABLED=false \ # disable orchestrator module
-e SERVER_USER=pmm \ # user for basic-auth
-e SERVER_PASSWORD=xxx \
percona/pmm-server:latest

运行完以上命令使用http://serverip/即可进入Grafana,首页为PMM的Home Dashboard,启动会自带pmm-server的系统监控指标:

Adding Metrics

Linux、MySQL监控都可以在client端使用pmm-admin工具添加:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# install pmm-admin
sudo yum install -y pmm-client
# config with pmm-server
pmm-admin config --server ${pmm-server-ip} --server-user pmm --server-password xxx --client-name ${client_name}
# add linux metrics
pmm-admin add linux:metrics --service-port=42000
# add mysql metrics, disable table statistics for space saving
pmm-admin add mysql:metrics --service-port=42002 -u pmm -p xxx -h localhost -P 3306 --disable-tablestats=true
# or you want to enable table statistics on some instances
pmm-admin add mysql -u pmm -p xxx --query-source=slowlog --disable-tablestats-limit=10000
# add remote MySQL - run it on a host which got the access to remote instance, like connecting to a RDS instance from your jump machine
pmm-admin add mysql:metrics --host xxxx.rds.amazonaws.com --user pmm --password xxx --port 3306 rds-${client_name}
# add Query Analytics Agent - collect slow queries
pmm-admin add mysql:queries -u pmm -p xxx -h localhost -P 3306 --query-source=slowlog

如果数据库是AWS的RDS实例,我们没法在宿主机上安装pmm-client拿到Linux系统指标,但是pmm已提前集成cloudwatch plugin,可以通过AWS API在PMM的Grafana上展示Cloudwatch数据,这包含了一部分系统和MySQL指标。添加Cloudwatch到PMM展示需要提供对应账户的Access key,账户策略如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{ "Version": "2012-10-17",
"Statement": [{ "Sid": "Stmt1508404837000",
"Effect": "Allow",
"Action": [ "rds:DescribeDBInstances",
"cloudwatch:GetMetricStatistics",
"cloudwatch:ListMetrics"],
"Resource": ["*"] },
{ "Sid": "Stmt1508410723001",
"Effect": "Allow",
"Action": [ "logs:DescribeLogStreams",
"logs:GetLogEvents",
"logs:FilterLogEvents" ],
"Resource": [ "arn:aws:logs:*:*:log-group:RDSOSMetrics:*" ]}
]
}

Pros and Cons

Percona自己的dashboard非常的丰富,配合Grafana高度灵活和炫酷的展示效果,基本覆盖了系统、数据库等等必须的、以及很多可能并不需要的指标了;加上学习成本低、开箱即用的特点,非常适用于快速构建全面的MySQL监控系统的场景。

不过另一方面,PMM的主要优势都体现在对Prometheus友好的封装上,QAN,Orchestrator这些功能目前还都非常鸡肋,实用性较低。同时也正是因为封装Prometheus,进行了大量预定义,导致Prometheus的部分功能在PMM上被限制甚至无法使用,一些二次改造的问题也在所难免。一旦需要深度定制监控系统,比如优化和扩容Prometheus、HA、自定义exporter等,用户最终将去学习Prometheus,而这个时候在PMM上扩展已经无法施展拳脚。

Reference

https://www.percona.com/doc/percona-server/LATEST/installation/docker.html

https://www.percona.com/doc/percona-monitoring-and-management/deploy/index.html

https://www.percona.com/doc/percona-monitoring-and-management/pmm-admin.html

https://www.percona.com/doc/percona-monitoring-and-management/amazon-rds.html

https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html

https://www.percona.com/doc/percona-monitoring-and-management/faq.html