Design and Practice of Monitoring Solutions for Kubernetes Clusters and Applications

2022年2月13日 2022点热度 0人点赞 0条评论

内容目录

Kubernetes Monitoring

When your application is deployed to Kubernetes, it becomes challenging to see what is happening inside the containers. Once a container dies, the data inside it may be permanently lost, and you might not even be able to view the logs to pinpoint the issue. Moreover, an application may have many instances, and a user request does not specify which container will handle it. This complexity makes troubleshooting applications in Kubernetes more difficult. Outside of the applications, Kubernetes manages the lifecycle of the entire cluster as the underlying infrastructure; thus, any failure in Kubernetes will undoubtedly impact the operational services of applications. Therefore, monitoring the health of Kubernetes is also crucial.

When your applications are cloud-native, you must pay attention to the operational status of various servers, middleware, and the operational status of every component and resource object in Kubernetes, along with each application's operational state. Of course, this operational status is a vague concept that depends on our focus. The "operational state" expressed by each monitored object is different. To monitor the objects we care about, these objects need to cooperate by providing appropriate operational status information for us to collect and analyze, which can be referred to as observability.

In cloud-native environments, observability is generally divided into three main scopes:

You can learn how to monitor, debug, and handle logs in the Kubernetes documentation:

https://v1-20.docs.kubernetes.io/docs/tasks/debug-application-cluster/

In this article, monitoring refers only to Metrics.

Metrics, Tracing, and Logging are not entirely independent. In the figure above, Metrics may also include information from Logging and Tracing.

Monitoring Targets

The monitoring data to be collected comes from the monitored objects, and in a Kubernetes cluster, we can categorize the objects we want to monitor into three major parts:

Machines: All node machines in the cluster, with metrics such as CPU and memory usage, network and disk IO rates, etc.;
Kubernetes Object Status: The statuses and some metrics information of objects like Deployments, Pods, DaemonSets, StatefulSets, etc.;
Applications: The statuses or metrics of each container in the Pods, as well as the /metrics endpoints that the containers may provide.

Prometheus

In a basic monitoring environment, a complete monitoring setup should include data collection, data storage, data analysis, data display, and alerting, among other components, each with relevant tools or technologies to solve the diverse needs and complexities in a cloud-native environment.

Since monitoring is necessary, monitoring tools are required. Monitoring tools can capture all important metrics and logs (Metrics may also encompass some logs) and store them in a secure, centralized location for easy access to devise solutions for issues. Because applications are deployed in a Kubernetes cluster, monitoring Kubernetes provides deep insights into the cluster's operational health and performance metrics, resource counts, and general internal conditions. When errors occur, monitoring tools will alert you (alerting functionality) to enable you to quickly implement fixes.

Prometheus is a CNCF project that natively monitors Kubernetes, nodes, and Prometheus itself. The current Kubernetes official documentation mainly recommends using Prometheus itself, as it provides out-of-the-box monitoring capabilities for the Kubernetes container orchestration platform. Therefore, the design of the monitoring solution in this article revolves around Prometheus.

Here are some components of Prometheus:

Metric Collection: Prometheus fetches metrics using a pull model over HTTP. If Prometheus cannot retrieve the metrics, it can alternatively use a Pushgateway to push metrics to Prometheus.
Metric Endpoint: Systems intended for monitoring with Prometheus should expose metrics at a specific /metrics endpoint, which Prometheus scrapes at regular intervals.
PromQL: Prometheus comes with PromQL, a very flexible query language used to query metrics from the Prometheus dashboard. Additionally, Prometheus UI and Grafana will use PromQL queries to visualize metrics.
Prometheus Exporters: There are many libraries and servers to help export existing metrics from third-party systems as Prometheus metrics. This is particularly useful for cases where direct usage of Prometheus metric detection for a given system is not possible.
TSDB (Time-Series Database): Prometheus uses a TSDB to efficiently store all data. By default, all data is stored locally, but to avoid single points of failure, Prometheus TSDB can be optionally integrated with remote storage.

The monitoring architecture of Prometheus in Kubernetes is shown below:

【Image Source: https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/】

Metrics

There are many types of objects to monitor, and similar types of objects are referred to as an entity. The data generated by each entity during its operation has various forms. To categorize and collect this data, Prometheus divides the various attribute values in an entity into four types: Counter, Gauge, Histogram, and Summary. Each attribute within the entity is referred to as a metric; for example, cumulative CPU usage by a container is recorded with the metric name container_cpu_usage_seconds_total.

The general format of each metric is:

metric_name{metadata=value} metric_value

Each object generates data continuously. To distinguish which object's current metric value belongs to, you can attach a wealth of metadata information to the metric besides its value. An example is shown below.

container_cpu_usage_seconds_total{
	beta_kubernetes_io_arch = "amd64",
	beta_kubernetes_io_os = "linux", 
	container = "POD", 
	cpu = "total", 
	id = "...", 
	image = "k8s.gcr.io/pause:3.5", 
	instance = "slave1", 
	job = "kubernetes-cadvisor", 
	kubernetes_io_arch = "amd64", 
	kubernetes_io_hostname = "slave1",
	kubernetes_io_os = "linux", 
	name = "k8s_POD_pvcpod_default_02ed547b-6279-4346-8918-551b87877e91_0", 
	namespace = "default", 
	pod = "pvcpod"
}

Once the object generates text in this structure, it can expose metrics endpoints for Prometheus to scrape automatically or be pushed to Prometheus via Pushgateway.

Next, we will build a complete Prometheus monitoring system in Kubernetes.

Practice

Node Monitoring

Reference for this chapter: https://devopscube.com/node-exporter-kubernetes/

Node exporter is written in Go and is used to collect all hardware and OS-level metrics exposed by the kernel on Linux systems, including CPU, memory, network traffic, system load, sockets, machine configuration, etc.

Readers can refer to the list of all default enabled or disabled metrics in https://github.com/prometheus/node_exporter.

Since we want to monitor every node in the cluster, we need to ensure that a node exporter instance runs on each node, and when new nodes are added to the cluster, a node exporter should automatically be scheduled to run on those nodes. Thus, the deployment of node exporter needs to use DaemonSet mode.

View all nodes in the cluster:

root@master:~# kubectl get nodes
NAME     STATUS                     ROLES                  AGE     VERSION
master   Ready,SchedulingDisabled   control-plane,master   98d     v1.22.2
slave2   Ready                      <none>                 3h50m   v1.23.3
slave1   Ready                      <none>                 98d     v1.22.2

Bibin Wilson has packaged up a YAML file for node exporter for Kubernetes, which we can directly download:

git clone https://github.com/bibinwilson/kubernetes-node-exporter

Open the daemonset.yaml file in the repository to understand the information contained within.

In the YAML file, you can see that the node exporter will be deployed in the monitoring namespace and has two labels:

   labels:
     app.kubernetes.io/component: exporter
     app.kubernetes.io/name: node-exporter

To allow the node exporter to be scheduled on the master node, we need to add toleration attributes to the Pod:

  template:
    metadata:
      labels:
        app.kubernetes.io/component: exporter
        app.kubernetes.io/name: node-exporter
    spec:
    # Copy the following section to the corresponding location
      tolerations:
      - key: "node-role.kubernetes.io/master"
        operator: "Exists"
        effect: "NoSchedule"
      - key: "node.kubernetes.io/unschedulable"
        operator: "Exists"
        effect: "NoSchedule"

To deploy the node exporter, we first create the namespace:

kubectl create namespace monitoring

Execute the command to deploy the node exporter:

kubectl create -f daemonset.yaml

Check node exporter instances

root@master:~# kubectl get daemonset -n monitoring
NAME            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
node-exporter   3         3         3       3            3           <none>          22h

Since the node exporter Pods are distributed across various nodes, to facilitate Prometheus in collecting the IPs of these node exporter Pods, we need to create a Service to unify the collection of these Endpoints, so we create a Service to achieve this purpose.

Check the service.yaml file in the repository, which is defined as follows:

kind: Service
apiVersion: v1
metadata:
  name: node-exporter
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9100'
spec:
  selector:
      app.kubernetes.io/component: exporter
      app.kubernetes.io/name: node-exporter
  ports:
  - name: node-exporter
    protocol: TCP
    port: 9100
    targetPort: 9100

The selector for this Service is as follows:
selector:
   app.kubernetes.io/component: exporter
   app.kubernetes.io/name: node-exporter

Create the Service:

kubectl create -f service.yaml

Check the collected node exporter Pod IPs in the Endpoint:

root@master:~# kubectl get endpoints -n monitoring 
NAME                    ENDPOINTS                                       AGE
node-exporter           10.32.0.27:9100,10.36.0.4:9100,10.44.0.3:9100   22h

The node exporter simply collects various metrics data and does nothing else.

Deploying Prometheus

Reference for this section: https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/

Now that we have the node exporter to collect various metrics from the nodes, the next step is to collect metrics data from the Kubernetes infrastructure.

Kubernetes itself provides many metrics data through three major endpoints: /metrics/cadvisor, /metrics/resource, and /metrics/probes.

Using the /metrics/cadvisor endpoint as an example, cAdvisor analyzes the memory, CPU, file, and network usage metrics of all containers running on a given node. You can refer to https://github.com/google/cadvisor/blob/master/docs/storage/prometheus.md to understand all metrics collected by cAdvisor.

Additional resources:

Source code location: https://github.com/kubernetes/metrics/blob/master/pkg/apis/metrics/v1beta1/types.go

Kubernetes Monitoring Architecture Design: https://github.com/kubernetes/design-proposals-archive

In this section, the deployed Prometheus will take the following actions on Kubernetes to collect metrics data:

Kubernetes APIServers: Obtain all metrics from the API server;
Kubernetes Nodes: It collects all metrics from the Kubernetes nodes;
kubernetes-pods: Add annotations prometheus.io/scrape and prometheus.io/port to pod metadata, so all pod metrics will be discovered;
kubernetes-cadvisor: Collect all cAdvisor metrics related to the containers;
Kubernetes Service Endpoints: If service metadata uses the prometheus.io/scrape annotation and prometheus.io/port annotation, all service endpoints will be removed from scraping.

Bibin Wilson has packaged the deployment definition files, which we can directly download:

git clone https://github.com/bibinwilson/kubernetes-prometheus

Prometheus uses the Kubernetes API Server to obtain metrics from all available nodes, Pods, Deployments, and so on. Therefore, we need to create an RBAC policy with only read access to the required API group and bind that policy to the monitoring namespace, limiting the Prometheus Pods to perform read operations only on the API.

Check the clusterRole.yaml file to see the list of resource objects it should monitor:

- apiGroups: [""]
  resources:
  - nodes
  - nodes/proxy
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups:
  - extensions
  resources:
  - ingresses

Create the role and role binding in the cluster:

kubectl create -f clusterRole.yaml

Prometheus can be configured through command-line flags and configuration files. While command-line flags set immutable system parameters (such as storage location and how much data to keep on disk and in memory), the configuration file defines all the scrape jobs and their instances as well as which rule files to load. Hence, file configuration is vital when deploying Prometheus.

The Prometheus configuration file is written in YAML format, and you can refer to the specifics here: https://prometheus.io/docs/prometheus/latest/configuration/configuration/.

To facilitate mapping the configuration file to the Prometheus Pod, we need to place the configuration in a ConfigMap and then mount it to the Pod. The configuration content can be reviewed in config-map.yaml. The config-map.yaml defines many scraping data source rules, such as collecting data from the Kubernetes cluster and node exporter, and the configuration can refer to:

    scrape_configs:
      - job_name: 'node-exporter'
        kubernetes_sd_configs:
          - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_endpoints_name]
          regex: 'node-exporter'
          action: keep

You can open https://raw.githubusercontent.com/bibinwilson/kubernetes-prometheus/master/config-map.yaml to preview this file online.

Create the ConfigMap:

kubectl create -f config-map.yaml

This configuration is crucial and needs to be customized based on the practical situation, typically handled by operations; we will not discuss it further here.

Next, we will deploy Prometheus. Since the example files utilize an empty volume to store Prometheus data, this means that data will be lost if the Pod restarts, so we can change it to a hostPath volume.

Open the prometheus-deployment.yaml file:

Change

          emptyDir: {}

          hostPath:
            path: /data/prometheus
            type: Directory

This change is optional.

If changed, the /data/prometheus directory should be created on the node where this Pod is scheduled.

部署 Prometheus：

kubectl create  -f prometheus-deployment.yaml

查看部署状态：

root@master:~# kubectl get deployments --namespace=monitoring
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
prometheus-deployment   1/1     1            1           23h

为了在外界访问 Prometheus，需要创建 Service：

apiVersion: v1
kind: Service
metadata:
  name: prometheus-service
  namespace: monitoring
  annotations:
      prometheus.io/scrape: 'true'
      prometheus.io/port:   '9090'
  
spec:
  selector: 
    app: prometheus-server
  type: NodePort  
  ports:
    - port: 8080
      targetPort: 9090 
      nodePort: 30000

kubectl create -f prometheus-service.yaml

接下来可以访问 Prometheus UI 面板。

点击 Graph，点击 🌏 图标，选择需要显示的指标值，再点击 Execute 查询显示。

你还可以在 Service Discovery 中，查看 Prometheus 采集的 metrics 数据源。

如果你的集群没有安装过 kube-state-metrics，那么这个数据源会显示红色标记，在下一节中，我们继续部署这个组件。

至此，我们的监控结构如下所示：

部署 Kube State Metrics

本节参考资料：https://devopscube.com/setup-kube-state-metrics/

Kube State Metrics 是一个服务，它与 Kubernetes API Server 通信，以获取所有 API 对象的详细信息，如 Deployment、Pod 等。

Kube State Metrics 提供了无法直接从本地 Kubernetes 监视组件获得的 Kubernetes 对象和资源度量，因为 Kubernetes Metrics 本身提供的指标并不是很全面，因此需要 Kube State Metrics 以获得与 Kubernetes 对象相关的所有度量。

以下是可以从 Kube State Metrics 中获得的一些重要度量：

Node status, node capacity (CPU and memory)
Replica-set compliance (desired/available/unavailable/updated status of replicas per deployment)
Pod status (waiting, running, ready, etc)
Ingress metrics
PV, PVC metrics
Daemonset & Statefulset metrics.
Resource requests and limits.
Job & Cronjob metrics

可以在这里的文档中查看受支持的详细指标：https://github.com/kubernetes/kube-state-metrics/tree/master/docs

Bibin Wilson 大佬已经封装好了相关的部署定义文件，我们直接下载即可：

git clone https://github.com/devopscube/kube-state-metrics-configs.git

直接应用所有 YAML 创建对应的资源：

kubectl apply -f kube-state-metrics-configs/

├── cluster-role-binding.yaml
├── cluster-role.yaml
├── deployment.yaml
├── service-account.yaml
└── service.yaml

上面创建的资源，包含以下部分，这一小节，就不展开讲解。

Service Account
Cluster Role
Cluster Role Binding
Kube State Metrics Deployment
Service

使用以下命令检查部署状态：

kubectl get deployments kube-state-metrics -n kube-system

随后，刷新 Prometheus Service Discovery，可以看到红色变成了蓝色，点击此数据源，可以看到以下信息：

- job_name: 'kube-state-metrics'
  static_configs:
    - targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']

此配置为 kube-state-metrics 的访问地址。

在此，我们部署的 Prometheus 结构如下：

部署 Grafana

本节参考资料：https://devopscube.com/setup-grafana-kubernetes/

经过前面几个小节的部署，已经搞好数据源的采集以及数据存储，接下来我们将部署 Grafana，利用 Grafana 对指标数据进行分析以及可视化。

Bibin Wilson 大佬已经封装好了相关的部署定义文件，我们直接下载即可：

git clone https://github.com/bibinwilson/kubernetes-grafana.git

首先查看 grafana-datasource-config.yaml 文件，此配置是为了 Grafana 自动配置好 Prometheus 数据源。

里面还有一个很重要的地址：

                "url": "http://prometheus-service.monitoring.svc:8080",

这里要确认你的 CoreDNS 是否正常，你可以参考 https://kubernetes.io/zh/docs/tasks/administer-cluster/dns-debugging-resolution/ 中列举的 DNS 调试方法，确认你的集群中是否可以通过 DNS 访问 Pod。最简单的方法是启动一个 Pod，然后使用命令测试 curl http://prometheus-service.monitoring.svc:8080，看看能不能获取到响应数据，如果出现：

root@master:~/jk/kubernetes-prometheus# curl http://prometheus-deployment.monitoring.svc:8080
curl: (6) Could not resolve host: prometheus-deployment.monitoring.svc
root@master:~/jk/kubernetes-prometheus# curl http://prometheus-deployment.monitoring.svc.cluster.local:8080
curl: (6) Could not resolve host: prometheus-deployment.monitoring.svc.cluster.local

可能是你 coredns 没有安装或者别的原因，导致无法通过此地址访问 Prometheus，为了避免过多操作，可以改为使用 IP，而不是域名。

查看 Prometheus 的 Service IP：

root@master:~/jk/kubernetes-prometheus# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
prometheus-deployment   NodePort    10.105.95.8     <none>        9090:32330/TCP   23h

测试通过 Service IP 访问是否正常：

root@master:~/jk/kubernetes-prometheus# curl 10.105.95.8:9090
<a href="/graph">Found</a>.

将 grafana-datasource-config.yaml 中的 prometheus-deployment.monitoring.svc.cluster.local:8080 改成对应的 Service IP，并且端口改成 9090。

创建配置：

kubectl create -f grafana-datasource-config.yaml

打开 deployment.yaml 查看定义，模板中 grafana 的数据存储也是使用 empty 卷，有数据丢失风险，因此可以改成用 hostPath 或其他类型的卷存储。可参考笔者的配置：

      volumes:
        - name: grafana-storage
          hostPath:
            path: /data/grafana
            type: Directory

部署 Grafana：

kubectl create -f deployment.yaml

然后创建 Service：

kubectl create -f service.yaml

接着可以通过 32000 端口访问 Grafana。

账号密码都是 admin

至此，我们部署的 Prometheus 监控结构如下：

刚刚进去的时候空空如也，我们需要利用图表模板制作可视化界面，才能显示出漂亮的数据。

在 Grafana 官方网站中，有很多社区制作的免费的模板 https://grafana.com/grafana/dashboards/?search=kubernetes

首先打开 https://grafana.com/grafana/dashboards/8588 下载这个模板，然后上传模板文件，并绑定对应的 Prometheus 数据源。

接下来就可以看到对应的监控界面了。

你可以打开 Browse，继续导入更多的模板，然后查看要显示的模板监控界面。

应用如何接入 Prometheus 和 Grafana

前面已经提及对基础设施的监控，我们还可以对中间件如 TIDB、Mysql 等生成、收集指标数据，还可以在程序中自定义指标数据，然后自行制作 Grafana 模板。如果你是 .NET 开发，还可以参考笔者的另一篇文章来一步步了解这些过程：https://www.cnblogs.com/whuanle/p/14969982.html

告警

在监控体系中，告警是重中之重，一般需要根据公司的实际情况自研告警处理和推送通知组件。

我们建议您阅读基于 Rob Ewaschuk 在 Google 的观察的我的警报哲学：https://docs.google.com/a/boxever.com/document/d/199PqyG3UsyXlwieHaqbGiWVa8eMWi8zzAn0YfcApr8Q/edit

在前面部署 Prometheus 时，config-map.yaml 便已经定义了一个告警规则。

  prometheus.rules: |-
    groups:
    - name: devopscube demo alert
      rules:
      - alert: High Pod Memory
        expr: sum(container_memory_usage_bytes) > 1
        for: 1m
        labels:
          severity: slack
        annotations:
          summary: High Memory Usage

一条告警规则主要由以下几部分组成：

alert：告警规则的名称。

expr：基于 PromQL 表达式告警触发条件，用于计算是否有时间序列满足该条件。

for：评估等待时间，可选参数。用于表示只有当触发条件持续一段时间后才发送告警。在等待期间新产生告警的状态为 pending。

labels：自定义标签，允许用户指定要附加到告警上的一组附加标签。

annotations：用于指定一组附加信息，比如用于描述告警详细信息的文字等，annotations 的内容在告警产生时会一同作为参数发送到 Alertmanager。

可参考：https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule