Alauda Container Platform

简体中文

监控组件容量规划

监控组件负责存储从平台中一个或多个集群收集的指标数据。因此，您需要提前评估监控规模，并根据本文档中的指南规划监控组件所需的资源。

目录

假设与方法论 Prometheus 小规模 — 10 个 worker 节点，500 个双容器 Pod 中等规模 — 50 个 worker 节点，2000 个双容器 Pod 大规模 — 500 个 worker 节点，10000 个双容器 Pod VictoriaMetrics 小规模 — 10 个 worker 节点，500 个双容器 Pod 中等规模 — 50 个 worker 节点，2000 个双容器 Pod 大规模 — 500 个 worker 节点，10000 个双容器 Pod

假设与方法论

本文档中的数据来自受控实验室性能报告，旨在作为生产规划的容量基线。
磁盘示例的保留时间为7天；其他保留目标请按比例调整。
存储基线符合上述警告（SSD，约6000 IOPS，约250MB/s 读写，独立挂载）。
测试工作负载涵盖了典型的监控页面，如“acp ns overview page”和“platform region detail page”。

Prometheus

以下是按规模划分的 Prometheus 及相关组件（Thanos Query、Thanos Sidecar 等）的容量建议。

小规模 — 10 个 worker 节点，500 个双容器 Pod

指标摄取速率：约2800 样本/秒

Component	Container	Replicas	CPU Limit	Memory Limit	Disk (if applicable)	Notes
courier-api	courier	2	2C	4Gi	-	-
kube-prometheus-thanos-query	thanos-query	1	1C	1Gi	-	-
prometheus-kube-prometheus-0	prometheus	1	2C	8Gi	20G	7 天内约写入 10G

中等规模 — 50 个 worker 节点，2000 个双容器 Pod

指标摄取速率：约7294 样本/秒

Component	Container	Replicas	CPU Limit	Memory Limit	Disk (if applicable)	Notes
courier-api	courier	2	4C	4Gi	-	-
kube-prometheus-thanos-query	thanos-query	1	2.5C	8Gi	-	-
prometheus-kube-prometheus-0	prometheus	1	4C	8Gi	40G	7 天内约写入 30G

大规模 — 500 个 worker 节点，10000 个双容器 Pod

指标摄取速率：约41575 样本/秒

Component	Container	Replicas	CPU Limit	Memory Limit	Disk (if applicable)	Notes
courier-api	courier	2	6C	4Gi	-	-
kube-prometheus-thanos-query	thanos-query	1	2C	6Gi	-	现场部署可能使用 2 个副本
prometheus-kube-prometheus-0	prometheus	1	8C	20Gi	100G	峰值内存约 15Gi；7 天内约写入 69G

VictoriaMetrics

以下是按规模划分的 VictoriaMetrics 组件容量建议。

小规模 — 10 个 worker 节点，500 个双容器 Pod

指标摄取速率：约3274 样本/秒

Component	Container	Replicas	CPU Limit	Memory Limit	Disk (if applicable)	Notes
courier-api	courier	1	2C	4Gi	-	-
vmselect-cluster	proxy	1	1C	200Mi	-	-
vmselect	vmselect	1	500m	1Gi	-	-
vmstorage-cluster	vmstorage	1	500m	2Gi	3G	7 天内约写入 1.5G

中等规模 — 50 个 worker 节点，2000 个双容器 Pod

指标摄取速率：约6940 样本/秒

Component	Container	Replicas	CPU Limit	Memory Limit	Disk (if applicable)	Notes
courier-api	courier	2	4C	4Gi	-	-
vmselect-cluster	proxy	1	1C	200Mi	-	-
vmselect	vmselect	1	2C	2Gi	-	-
vmstorage-cluster	vmstorage	1	2C	2Gi	10G	7 天内约写入 2.6G

大规模 — 500 个 worker 节点，10000 个双容器 Pod

指标摄取速率：约34300 样本/秒

Component	Container	Replicas	CPU Limit	Memory Limit	Disk (if applicable)	Notes
courier-api	courier	2	6C	4Gi	-	-
vmselect-cluster	proxy	1	2C	200Mi	-	-
vmselect	vmselect	1	5C	3Gi	-	-
vmstorage-cluster	vmstorage	1	2C	6Gi	30G	7 天内约写入 16.8G