创建集群
本文档提供了使用 Cluster API 在 DCS 平台上创建 Kubernetes 集群的完整操作指南。该过程涉及部署和配置多个 Kubernetes 资源,这些资源协同工作以提供和管理集群基础设施。
前提条件
在创建集群之前,请确保满足以下所有前提条件:
1. 必需插件安装
在 的 global 集群上安装以下插件:
- Cluster API Provider Kubeadm - 提供 Kubernetes 集群引导能力
- Cluster API Provider DCS - 支持 DCS 基础设施集成与管理
详细安装步骤请参见安装指南。
2. 虚拟机模板准备
进行 Kubernetes 安装时,您需要:
- 将 提供的 MicroOS 镜像上传至 DCS 平台
- 基于该镜像创建虚拟机模板
- 确保模板包含所有必要的 Kubernetes 组件
关于每个虚拟机镜像中包含的 Kubernetes 组件详情,请参见操作系统支持矩阵。
3. 网络连通性
确保 的 global 集群中所有节点能够访问 DCS 平台的以下端口:
- 端口 7443(DCS API)
- 端口 8443(DCS Web 控制台)
要求: 集群创建和管理必须保证两个端口均可访问。
4. DCS 平台凭证
用于集群创建的 DCS 平台凭证需满足特定要求。
用户配置要求
- 用户类型:必须为
接口互联用户(Interface interconnection user)
- 角色:必须为
administrator
密码策略要求
请在 系统管理 → 权限管理 → 权限管理策略 中确认以下设置:
- 策略:
接口互联用户密码重置及首次登录是否修改密码
- 值:必须设置为
否
影响: 若设置为 是,用户首次登录时将被强制修改密码,导致认证失败,进而造成集群创建失败。
5. 公共镜像仓库配置
在 上配置公共镜像仓库凭证,包括:
详细配置步骤请参见 Alauda Container Platform 文档:配置 → 集群 → 如何 → 更新公共镜像仓库凭证。
集群创建概述
总体上,您将在 的 global 集群中创建以下 Cluster API 资源,以完成基础设施的供应和 Kubernetes 集群的引导。
WARNING
重要命名空间要求
为确保与 作为业务集群的正确集成,所有资源必须部署在 cpaas-system 命名空间中。若部署在其他命名空间,可能导致集成异常。
控制平面配置
控制平面负责管理集群状态、调度和 Kubernetes API。以下内容展示如何配置高可用控制平面。
WARNING
配置参数注意事项
配置资源时,请谨慎修改参数:
- 仅替换用
<> 括起来的值,替换为您环境中的实际值
- 保留所有其他参数不变,这些参数为优化或必需配置
- 修改非占位符参数可能导致集群不稳定或集成异常
配置流程
请按以下步骤依次操作:
- 规划网络并部署 API 负载均衡器
- 配置 DCS 凭证(Secret)
- 创建 IP 和主机名池
- 创建控制平面
DCSMachineTemplate
- 配置
KubeadmControlPlane
- 配置
DCSCluster
- 创建
Cluster
应用清单后, 将创建 DCS Kubernetes 控制平面。
网络规划与负载均衡器
创建控制平面资源前,需规划网络架构并部署负载均衡器以实现高可用。
要求
- 网络划分:规划控制平面节点的 IP 地址段
- 负载均衡器:部署并配置访问 API 服务器
- IP 绑定:将负载均衡器绑定到控制平面 IP 池中的 IP
- 连通性:确保所有组件间网络连通
负载均衡器将 API 服务器流量分发至控制平面节点,保证可用性和容错能力。
配置 DCS 认证信息
DCS 认证信息存储于 Secret 资源中。
以下示例中,<auth-secret-name> 为保存的 Secret 名称:
apiVersion: v1
data:
authUser: <base64-encoded-auth-user>
authKey: <base64-encoded-auth-key>
endpoint: <base64-encoded-endpoint>
kind: Secret
metadata:
name: <auth-secret-name>
namespace: cpaas-system
type: Opaque
配置 IP 和主机名池
需提前规划控制平面虚拟机的 IP 地址、主机名、DNS 服务器及其他网络信息。
以下示例中,<control-plane-iphostname-pool-name> 为资源名称:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSIpHostnamePool
metadata:
name: <control-plane-iphostname-pool-name>
namespace: cpaas-system
spec:
pool:
- ip: "<control-plane-ip-1>"
mask: "<control-plane-mask>"
gateway: "<control-plane-gateway>"
dns: "<control-plane-dns>"
hostname: "<control-plane-hostname-1>"
machineName: "<control-plane-machine-name-1>"
配置机器模板(控制平面)
DCS 机器模板声明后续 Cluster API 组件创建的 DCS 机器配置。模板指定虚拟机模板、附加磁盘、CPU、内存等配置信息。
WARNING
存储要求
数据存储集群跨主机访问
数据存储集群(datastoreClusterName)必须支持 DCS 平台所有物理机的跨主机访问。
- 原因:若数据存储仅在部分主机可用,DCS 平台调度 VM 到其他主机时会导致创建失败。
Ignition 共享存储
若数据存储不支持直接文件上传(Ignition 配置所需),必须提供支持多主机挂载的共享存储方案(如 NFS)。
磁盘配置规则
可添加自定义磁盘,但必须保留示例中的必需系统盘和数据盘(systemVolume、/var/lib/kubelet、/var/lib/containerd、/var/cpaas)。
以下示例中,<cp-dcs-machine-template-name> 为控制平面机器模板名称:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
metadata:
name: <cp-dcs-machine-template-name>
namespace: cpaas-system
spec:
template:
spec:
vmTemplateName: <vm-template-name>
location:
type: folder
name: <folder-name>
resource: # 可选,未指定则使用模板默认
type: cluster # cluster | host, 可选
name: <cluster-name> # 可选
vmConfig:
dvSwitchName: <dv-switch-name> # 可选
portGroupName: <port-group-name> # 可选
dcsMachineCpuSpec:
quantity: <control-plane-cpu>
dcsMachineMemorySpec: # MB
quantity: <control-plane-memory>
dcsMachineDiskSpec: # GB
- quantity: 0
datastoreClusterName: <datastore-cluster-name>
systemVolume: true
- quantity: 10
datastoreClusterName: <datastore-cluster-name>
path: /var/lib/etcd
format: xfs
- quantity: 100
datastoreClusterName: <datastore-cluster-name>
path: /var/lib/kubelet
format: xfs
- quantity: 100
datastoreClusterName: <datastore-cluster-name>
path: /var/lib/containerd
format: xfs
- quantity: 100
datastoreClusterName: <datastore-cluster-name>
path: /var/cpaas
format: xfs
ipHostPoolRef:
name: <control-plane-iphostname-pool-name>
关键参数说明
配置 KubeadmControlPlane
当前 DCS 控制平面实现依赖 Cluster API 控制平面提供者 kubeadm,需配置 KubeadmControlPlane 资源。
示例中的大部分参数已为优化或必需配置,部分参数可根据环境进行定制。
以下示例中,<cluster-name> 为资源名称:
组件版本(如 <dns-image-tag>、<etcd-image-tag>)请参考操作系统支持矩阵。
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: <cluster-name>
namespace: cpaas-system
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
kubeadmConfigSpec:
users:
- name: boot
sshAuthorizedKeys:
- "<ssh-authorized-keys>"
format: ignition
files:
- path: /etc/kubernetes/admission/psa-config.yaml
owner: "root:root"
permissions: "0644"
content: |
# ... (Admission Configuration Content) ...
- path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
owner: "root:root"
permissions: "0644"
content: |
{
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"kind": "KubeletConfiguration",
"_comment": "... (Kubelet Configuration Content) ..."
}
- path: /etc/kubernetes/encryption-provider.conf
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-secret>
# ... (Other files including admission config, audit policy, kubelet config) ...
# ... (preKubeadmCommands & postKubeadmCommands) ...
clusterConfiguration:
imageRepository: cloud.alauda.io/alauda
dns:
imageTag: <dns-image-tag>
etcd:
local:
imageTag: <etcd-image-tag>
# ... (apiServer, controllerManager, scheduler configurations) ...
# ... (initConfiguration & joinConfiguration) ...
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: <cp-dcs-machine-template-name>
replicas: 3
version: <control-plane-kubernetes-version>
关键参数说明
配置 DCSCluster
DCSCluster 是基础设施集群声明。由于 DCS 平台当前不提供原生负载均衡器,需提前手动配置负载均衡器,并绑定到“配置虚拟机 IP 和主机名池”中配置的 IP 地址。
以下示例中,<cluster-name> 为资源名称:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
metadata:
name: "<cluster-name>"
namespace: cpaas-system
spec:
controlPlaneLoadBalancer: # 配置高可用
host: <load-balancer-ip-or-domain-name>
port: 6443
type: external
credentialSecretRef: # 引用认证 Secret
name: <auth-secret-name>
controlPlaneEndpoint: # Cluster API 规范,保持与 controlPlane 一致
host: <load-balancer-ip-or-domain-name>
port: 6443
networkType: kube-ovn
site: <site> # DCS 平台参数,资源池 ID
关键参数说明
配置 Cluster
Cluster API 中的 Cluster 资源用于声明集群,需引用对应的控制平面资源和基础设施集群资源:
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: DCSCluster
cpaas.io/kube-ovn-version: <kube-ovn-version>
cpaas.io/kube-ovn-join-cidr: <kube-ovn-join-cidr>
labels:
cluster-type: DCS
name: <cluster-name>
namespace: cpaas-system
spec:
clusterNetwork:
pods:
cidrBlocks:
- <pods-cidr>
services:
cidrBlocks:
- <services-cidr>
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: <cluster-name>
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
name: <cluster-name>
关键参数说明
节点部署
请参见部署节点页面获取操作说明。
集群验证
部署完所有集群资源后,验证集群是否创建成功并正常运行。
使用 控制台
- 进入 控制台的 管理员 视图
- 访问 集群 → 集群
- 在集群列表中找到新创建的集群
- 确认集群状态显示为 运行中
- 检查所有控制平面和工作节点均为 Ready
使用 kubectl
也可通过 kubectl 命令验证集群:
# 查看集群状态
kubectl get cluster -n cpaas-system <cluster-name>
# 查看控制平面节点
kubectl get kubeadmcontrolplane -n cpaas-system <cluster-name>
# 查看机器状态
kubectl get machines -n cpaas-system
# 查看集群部署状态
kubectl get clustermodule <cluster-name> -o jsonpath='{.status.base.deployStatus}'
预期结果
成功创建的集群应显示:
- 集群状态:运行中 或 已配置
- 所有控制平面机器:运行中
- 所有工作节点(若已部署):运行中
- Kubernetes 节点:Ready
- 集群模块状态:已完成
附录
完整 KubeadmControlPlane 配置
以下为完整的 KubeadmControlPlane 配置,包含所有默认审计策略、准入控制及文件内容。您可复制整块内容并根据需要修改 <placeholders>。
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: <cluster-name>
namespace: cpaas-system
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
kubeadmConfigSpec:
users:
- name: boot
sshAuthorizedKeys:
- "<ssh-authorized-keys>"
format: ignition
files:
- path: /etc/kubernetes/admission/psa-config.yaml
owner: "root:root"
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1
kind: PodSecurityConfiguration
defaults:
enforce: "privileged"
enforce-version: "latest"
audit: "baseline"
audit-version: "latest"
warn: "baseline"
warn-version: "latest"
exemptions:
usernames: []
runtimeClasses: []
namespaces:
- kube-system
- cpaas-system
- path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
owner: "root:root"
permissions: "0644"
content: |
{
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"kind": "KubeletConfiguration",
"protectKernelDefaults": true,
"tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
"tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key",
"streamingConnectionIdleTimeout": "5m",
"clientCAFile": "/etc/kubernetes/pki/ca.crt"
}
- path: /etc/kubernetes/encryption-provider.conf
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-secret>
- path: /etc/kubernetes/audit/policy.yaml
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: audit.k8s.io/v1
kind: Policy
# Don't generate audit events for all requests in RequestReceived stage.
omitStages:
- "RequestReceived"
rules:
# The following requests were manually identified as high-volume and low-risk,
# so drop them.
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: "" # core
resources: ["endpoints"]
# Don't log these read-only URLs.
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
# Don't log events requests.
- level: None
resources:
- group: "" # core
resources: ["events"]
# Don't log devops requests.
- level: None
resources:
- group: "devops.alauda.io"
# Don't log get list watch requests.
- level: None
verbs: ["get", "list", "watch"]
# Don't log lease operation
- level: None
resources:
- group: "coordination.k8s.io"
resources: ["leases"]
# Don't log access review and token review requests.
- level: None
resources:
- group: "authorization.k8s.io"
resources: ["subjectaccessreviews", "selfsubjectaccessreviews"]
- group: "authentication.k8s.io"
resources: ["tokenreviews"]
# Don't log imagewhitelists and namespaceoverviews operations
- level: None
resources:
- group: "app.alauda.io"
resources: ["imagewhitelists"]
- group: "k8s.io"
resources: ["namespaceoverviews"]
# Secrets, ConfigMaps can contain sensitive & binary data,
# so only log at the Metadata level.
- level: Metadata
resources:
- group: "" # core
resources: ["secrets", "configmaps"]
# devops installmanifests and katanomis can contains huge data and sensitive data, only log at the Metadata level.
- level: Metadata
resources:
- group: "operator.connectors.alauda.io"
resources: ["installmanifests"]
- group: "operators.katanomi.dev"
resources: ["katanomis"]
# Default level for known APIs
- level: RequestResponse
resources:
- group: "" # core
- group: "aiops.alauda.io"
- group: "apps"
- group: "app.k8s.io"
- group: "authentication.istio.io"
- group: "auth.alauda.io"
- group: "autoscaling"
- group: "asm.alauda.io"
- group: "clusterregistry.k8s.io"
- group: "crd.alauda.io"
- group: "infrastructure.alauda.io"
- group: "monitoring.coreos.com"
- group: "operators.coreos.com"
- group: "networking.istio.io"
- group: "extensions.istio.io"
- group: "install.istio.io"
- group: "security.istio.io"
- group: "telemetry.istio.io"
- group: "opentelemetry.io"
- group: "networking.k8s.io"
- group: "portal.alauda.io"
- group: "rbac.authorization.k8s.io"
- group: "storage.k8s.io"
- group: "tke.cloud.tencent.com"
- group: "devopsx.alauda.io"
- group: "core.katanomi.dev"
- group: "deliveries.katanomi.dev"
- group: "integrations.katanomi.dev"
- group: "artifacts.katanomi.dev"
- group: "builds.katanomi.dev"
- group: "versioning.katanomi.dev"
- group: "sources.katanomi.dev"
- group: "tekton.dev"
- group: "operator.tekton.dev"
- group: "eventing.knative.dev"
- group: "flows.knative.dev"
- group: "messaging.knative.dev"
- group: "operator.knative.dev"
- group: "sources.knative.dev"
- group: "operator.devops.alauda.io"
- group: "flagger.app"
- group: "jaegertracing.io"
- group: "velero.io"
resources: ["deletebackuprequests"]
- group: "connectors.alauda.io"
- group: "operator.connectors.alauda.io"
resources: ["connectorscores", "connectorsgits", "connectorsocis"]
# Default level for all other requests.
- level: Metadata
preKubeadmCommands:
- while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
- mkdir -p /run/cluster-api && restorecon -Rv /run/cluster-api
- if [ -f /etc/disk-setup.sh ]; then bash /etc/disk-setup.sh; fi
postKubeadmCommands:
- chmod 600 /var/lib/kubelet/config.yaml
clusterConfiguration:
imageRepository: cloud.alauda.io/alauda
dns:
imageTag: <dns-image-tag>
etcd:
local:
imageTag: <etcd-image-tag>
apiServer:
extraArgs:
audit-log-format: json
audit-log-maxage: "30"
audit-log-maxbackup: "10"
audit-log-maxsize: "200"
profiling: "false"
audit-log-mode: batch
audit-log-path: /etc/kubernetes/audit/audit.log
audit-policy-file: /etc/kubernetes/audit/policy.yaml
tls-cipher-suites: "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384"
encryption-provider-config: /etc/kubernetes/encryption-provider.conf
admission-control-config-file: /etc/kubernetes/admission/psa-config.yaml
tls-min-version: VersionTLS12
kubelet-certificate-authority: /etc/kubernetes/pki/ca.crt
extraVolumes:
- name: vol-dir-0
hostPath: /etc/kubernetes
mountPath: /etc/kubernetes
pathType: Directory
controllerManager:
extraArgs:
bind-address: "::"
profiling: "false"
tls-min-version: VersionTLS12
flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
scheduler:
extraArgs:
bind-address: "::"
tls-min-version: VersionTLS12
profiling: "false"
initConfiguration:
patches:
directory: /etc/kubernetes/patches
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master"
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
joinConfiguration:
patches:
directory: /etc/kubernetes/patches
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master"
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: <cp-dcs-machine-template-name>
replicas: 3
version: <control-plane-kubernetes-version>