在 Huawei DCS 上创建集群
本文档介绍如何在 Huawei DCS 平台上创建 Kubernetes 集群。基于 YAML 的集群创建可通过 manifest 实现。如果已安装 Fleet Essentials 且 Alauda Container Platform DCS Infrastructure Provider 版本为 1.0.13 或更高,也可以通过 web UI 创建集群。
INFO
web UI 提供带校验的引导式流程,而 YAML 则提供更高的自动化灵活性。
前提条件
在创建集群之前,请确保满足以下所有前提条件:
1. 基础设施资源
在创建集群之前,请先配置以下基础设施资源:
- 云凭证 - DCS 平台访问信息
- IP 池 - 用于集群节点的网络配置
- 机器模板 - 控制平面和工作节点的 VM 规格
有关详细配置说明,请参阅 Huawei DCS 的基础设施资源。
2. 必需的插件安装
请在 global 集群上安装以下插件:
- Alauda Container Platform Kubeadm Provider
- Alauda Container Platform DCS Infrastructure Provider
有关详细安装说明,请参阅 安装指南。
3. 虚拟机模板准备
安装 Kubernetes 之前,您必须:
- 将 MicroOS 镜像上传到 DCS 平台
- 基于该镜像创建虚拟机模板
- 确保模板包含所有必需的 Kubernetes 组件
有关每个 VM 镜像中包含的 Kubernetes 组件详情,请参阅 OS 支持矩阵。
4. 网络连通性
请确保 global 集群中的所有节点都可以访问 DCS 平台上的以下端口:
- 端口 7443(DCS API)
- 端口 8443(DCS Web Console)
要求:集群创建和管理必须同时连通这两个端口。
5. LoadBalancer 配置
在创建集群之前,请为 Kubernetes API Server 配置一个 LoadBalancer。LoadBalancer 会将 API server 流量分发到各控制平面节点,以确保高可用性。
6. 公共 Registry 配置
请配置公共 Registry 凭证。这包括:
- Registry 仓库地址配置
- 正确的认证凭证设置
使用 web UI
版本要求:此流程需要 Fleet Essentials 和 Alauda Container Platform DCS Infrastructure Provider 1.0.13 或更高版本。如果 provider 版本早于 1.0.13,请使用 YAML manifest。
创建流程
集群创建遵循一个 5 步向导:
步骤 1:基本信息
↓
步骤 2:控制平面节点池
↓
步骤 3:工作节点池
↓
步骤 4:网络
↓
步骤 5:审阅
导航路径:集群 → 集群 → 创建集群 → 选择 Huawei DCS
步骤 1:基本信息
前提检查:
在创建集群之前,请确保:
- DCS VM 模板已存在于 DCS 平台中,并且 MicroOS 版本与 Kubernetes 版本匹配
- 已为 Kubernetes API Server 配置 LoadBalancer
版本限制:只能创建平台支持的最新 Kubernetes 版本。
步骤 2:控制平面节点池
控制平面节点池固定为 3 个副本,以确保高可用性。
校验:关联的 IP 池必须有足够的可用 IP 地址(≥ 3)。
步骤 3:工作节点池
您可以添加多个工作节点池。每个节点池具有以下配置:
校验规则:
- 节点池名称在集群内必须唯一
- IP 池必须有足够的可用 IP 地址(≥ Replicas)
- maxSurge 和 maxUnavailable 必须满足约束:如果 maxSurge = 0,则 maxUnavailable > 0
提示:建议在节点池名称前加上集群名称并使用连字符分隔(例如 mycluster-worker-1),以避免不同集群之间的命名冲突。
步骤 4:网络
校验:Pods CIDR 和 Services CIDR 之间不能重叠。
步骤 5:审阅
在创建集群之前,请审阅所有配置项:
基本信息:
- 名称、显示名称、基础设施凭证
- Distribution Version、Kubernetes Version
- 集群 API 地址
控制平面节点池:
- 机器模板,包括 VM Template Name、OS Version、Kubernetes Version
- CPU、Memory、Replicas、SSH Keys
工作节点池(列表视图):
- 节点池名称、机器模板、副本数
- 最大超量、最大不可用、SSH Keys
网络:
- Pods CIDR、Services CIDR、Join CIDR
单击 创建 即可开始集群创建流程。
使用 YAML
集群创建流程
使用 YAML 时,您需要在 global 集群中创建 Cluster API 资源,以提供基础设施并引导一个可运行的 Kubernetes 集群。
WARNING
重要的命名空间要求
为确保作为业务集群进行正确集成,所有资源都必须部署在 cpaas-system 命名空间中。将资源部署到其他命名空间可能导致集成问题。
配置流程
请按以下顺序执行:
- 配置 KubeadmControlPlane
- 配置 DCSCluster
- 创建 Cluster 资源
注意:基础设施资源(Secret、DCSIpHostnamePool、DCSMachineTemplate)应单独配置。有关说明请参阅 Huawei DCS 的基础设施资源。
网络规划与负载均衡器
在创建控制平面资源之前,请规划网络架构并部署一个负载均衡器以实现高可用性。
要求:
- 网络分段:规划控制平面节点的 IP 地址范围
- 负载均衡器:部署并配置 API server 访问
- API server 地址:为 Kubernetes API Server 准备一个稳定的 VIP 或负载均衡器地址
- 连通性:确保所有组件之间的网络连通
配置 KubeadmControlPlane
KubeadmControlPlane 资源定义了控制平面配置,包括 Kubernetes 版本、节点规格和引导设置。
kubeadmcontrolplane.yaml
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: <cluster-name>
namespace: cpaas-system
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
kubeadmConfigSpec:
users:
- name: boot
sshAuthorizedKeys:
- "<ssh-authorized-keys>"
format: ignition
files:
- path: /etc/kubernetes/admission/psa-config.yaml
owner: "root:root"
permissions: "0644"
content: |
# ... (Admission Configuration Content) ...
- path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
owner: "root:root"
permissions: "0644"
content: |
{
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"kind": "KubeletConfiguration",
"_comment": "... (Kubelet Configuration Content) ..."
}
# ... (other files) ...
clusterConfiguration:
imageRepository: cloud.alauda.io/alauda
dns:
imageTag: <dns-image-tag>
etcd:
local:
imageTag: <etcd-image-tag>
# ... (apiServer, controllerManager, scheduler) ...
initConfiguration:
patches:
directory: /etc/kubernetes/patches
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master"
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
joinConfiguration:
patches:
directory: /etc/kubernetes/patches
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master"
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: <cp-dcs-machine-template-name>
replicas: 3
version: <control-plane-kubernetes-version>
参数说明:
有关组件版本(例如 <dns-image-tag>、<etcd-image-tag>),请参阅 OS 支持矩阵。
配置 DCSCluster
DCSCluster 是基础设施集群声明,用于引用负载均衡器和 DCS 平台凭证。
dcscluster.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
metadata:
name: "<cluster-name>"
namespace: cpaas-system
spec:
controlPlaneLoadBalancer:
host: <load-balancer-ip-or-domain-name>
port: 6443
type: external
credentialSecretRef:
name: <auth-secret-name>
controlPlaneEndpoint:
host: <load-balancer-ip-or-domain-name>
port: 6443
networkType: kube-ovn
site: <site>
参数说明:
配置 Cluster
Cluster 资源用于声明集群,并引用控制平面和基础设施资源。
cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
annotations:
capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
capi.cpaas.io/resource-kind: DCSCluster
cpaas.io/kube-ovn-version: <kube-ovn-version>
cpaas.io/kube-ovn-join-cidr: <kube-ovn-join-cidr>
labels:
cluster-type: DCS
name: <cluster-name>
namespace: cpaas-system
spec:
clusterNetwork:
pods:
cidrBlocks:
- <pods-cidr>
services:
cidrBlocks:
- <services-cidr>
controlPlaneRef:
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
name: <cluster-name>
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSCluster
name: <cluster-name>
参数说明:
部署节点
有关部署工作节点的说明,请参阅 Huawei DCS 上的节点管理。
集群验证
部署完所有集群资源后,请验证集群是否已成功创建并正常运行。
使用控制台
- 导航到 集群 → 集群
- 在集群列表中找到新创建的集群
- 验证集群状态显示为 Running
- 检查所有控制平面和工作节点是否为 Ready
使用 kubectl
或者,使用 kubectl 命令验证集群:
# Check cluster status
kubectl get cluster -n cpaas-system <cluster-name>
# Verify control plane
kubectl get kubeadmcontrolplane -n cpaas-system <cluster-name>
# Check machine status
kubectl get machines -n cpaas-system
# Verify cluster deployment
kubectl get clustermodule <cluster-name> -o jsonpath='{.status.base.deployStatus}'
预期结果
成功创建的集群应显示:
- 集群状态:Running 或 Provisioned
- 所有控制平面机器:Running
- 所有工作节点(如果已部署):Running
- Kubernetes 节点:Ready
- Cluster Module 状态:Completed
附录
完整的 KubeadmControlPlane 配置
下面是完整的 KubeadmControlPlane 配置,包括所有默认审计策略、准入控制和文件内容。
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
name: <cluster-name>
namespace: cpaas-system
annotations:
controlplane.cluster.x-k8s.io/skip-kube-proxy: ""
spec:
rolloutStrategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
kubeadmConfigSpec:
users:
- name: boot
sshAuthorizedKeys:
- "<ssh-authorized-keys>"
format: ignition
files:
- path: /etc/kubernetes/admission/psa-config.yaml
owner: "root:root"
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: AdmissionConfiguration
plugins:
- name: PodSecurity
configuration:
apiVersion: pod-security.admission.config.k8s.io/v1
kind: PodSecurityConfiguration
defaults:
enforce: "privileged"
enforce-version: "latest"
audit: "baseline"
audit-version: "latest"
warn: "baseline"
warn-version: "latest"
exemptions:
usernames: []
runtimeClasses: []
namespaces:
- kube-system
- cpaas-system
- path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
owner: "root:root"
permissions: "0644"
content: |
{
"apiVersion": "kubelet.config.k8s.io/v1beta1",
"kind": "KubeletConfiguration",
"protectKernelDefaults": true,
"tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
"tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key",
"streamingConnectionIdleTimeout": "5m",
"clientCAFile": "/etc/kubernetes/pki/ca.crt"
}
- path: /etc/kubernetes/encryption-provider.conf
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: apiserver.config.k8s.io/v1
kind: EncryptionConfiguration
resources:
- resources:
- secrets
providers:
- aescbc:
keys:
- name: key1
secret: <base64-encoded-secret>
- path: /etc/kubernetes/audit/policy.yaml
owner: "root:root"
append: false
permissions: "0644"
content: |
apiVersion: audit.k8s.io/v1
kind: Policy
omitStages:
- "RequestReceived"
rules:
- level: None
users:
- system:kube-controller-manager
- system:kube-scheduler
- system:serviceaccount:kube-system:endpoint-controller
verbs: ["get", "update"]
namespaces: ["kube-system"]
resources:
- group: ""
resources: ["endpoints"]
- level: None
nonResourceURLs:
- /healthz*
- /version
- /swagger*
- level: None
resources:
- group: ""
resources: ["events"]
- level: None
resources:
- group: "devops.alauda.io"
- level: None
verbs: ["get", "list", "watch"]
- level: None
resources:
- group: "coordination.k8s.io"
resources: ["leases"]
- level: None
resources:
- group: "authorization.k8s.io"
resources: ["subjectaccessreviews", "selfsubjectaccessreviews"]
- group: "authentication.k8s.io"
resources: ["tokenreviews"]
- level: None
resources:
- group: "app.alauda.io"
resources: ["imagewhitelists"]
- group: "k8s.io"
resources: ["namespaceoverviews"]
- level: Metadata
resources:
- group: ""
resources: ["secrets", "configmaps"]
- level: Metadata
resources:
- group: "operator.connectors.alauda.io"
resources: ["installmanifests"]
- group: "operators.katanomi.dev"
resources: ["katanomis"]
- level: RequestResponse
resources:
- group: ""
- group: "aiops.alauda.io"
- group: "apps"
- group: "app.k8s.io"
- group: "authentication.istio.io"
- group: "auth.alauda.io"
- group: "autoscaling"
- group: "asm.alauda.io"
- group: "clusterregistry.k8s.io"
- group: "crd.alauda.io"
- group: "infrastructure.alauda.io"
- group: "monitoring.coreos.com"
- group: "operators.coreos.com"
- group: "networking.istio.io"
- group: "extensions.istio.io"
- group: "install.istio.io"
- group: "security.istio.io"
- group: "telemetry.istio.io"
- group: "opentelemetry.io"
- group: "networking.k8s.io"
- group: "portal.alauda.io"
- group: "rbac.authorization.k8s.io"
- group: "storage.k8s.io"
- group: "tke.cloud.tencent.com"
- group: "devopsx.alauda.io"
- group: "core.katanomi.dev"
- group: "deliveries.katanomi.dev"
- group: "integrations.katanomi.dev"
- group: "artifacts.katanomi.dev"
- group: "builds.katanomi.dev"
- group: "versioning.katanomi.dev"
- group: "sources.katanomi.dev"
- group: "tekton.dev"
- group: "operator.tekton.dev"
- group: "eventing.knative.dev"
- group: "flows.knative.dev"
- group: "messaging.knative.dev"
- group: "operator.knative.dev"
- group: "sources.knative.dev"
- group: "operator.devops.alauda.io"
- group: "flagger.app"
- group: "jaegertracing.io"
- group: "velero.io"
resources: ["deletebackuprequests"]
- group: "connectors.alauda.io"
- group: "operator.connectors.alauda.io"
resources: ["connectorscores", "connectorsgits", "connectorsocis"]
- level: Metadata
preKubeadmCommands:
- while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
- mkdir -p /run/cluster-api && restorecon -Rv /run/cluster-api
- if [ -f /etc/disk-setup.sh ]; then bash /etc/disk-setup.sh; fi
postKubeadmCommands:
- chmod 600 /var/lib/kubelet/config.yaml
clusterConfiguration:
imageRepository: cloud.alauda.io/alauda
dns:
imageTag: <dns-image-tag>
etcd:
local:
imageTag: <etcd-image-tag>
apiServer:
extraArgs:
audit-log-format: json
audit-log-maxage: "30"
audit-log-maxbackup: "10"
audit-log-maxsize: "200"
profiling: "false"
audit-log-mode: batch
audit-log-path: /etc/kubernetes/audit/audit.log
audit-policy-file: /etc/kubernetes/audit/policy.yaml
tls-cipher-suites: "TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384"
encryption-provider-config: /etc/kubernetes/encryption-provider.conf
admission-control-config-file: /etc/kubernetes/admission/psa-config.yaml
tls-min-version: VersionTLS12
kubelet-certificate-authority: /etc/kubernetes/pki/ca.crt
extraVolumes:
- name: vol-dir-0
hostPath: /etc/kubernetes
mountPath: /etc/kubernetes
pathType: Directory
controllerManager:
extraArgs:
bind-address: "::"
profiling: "false"
tls-min-version: VersionTLS12
flex-volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
scheduler:
extraArgs:
bind-address: "::"
tls-min-version: VersionTLS12
profiling: "false"
initConfiguration:
patches:
directory: /etc/kubernetes/patches
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master"
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
joinConfiguration:
patches:
directory: /etc/kubernetes/patches
nodeRegistration:
kubeletExtraArgs:
node-labels: "kube-ovn/role=master"
provider-id: PROVIDER_ID
volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
protect-kernel-defaults: "true"
machineTemplate:
nodeDrainTimeout: 1m
nodeDeletionTimeout: 5m
infrastructureRef:
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: DCSMachineTemplate
name: <cp-dcs-machine-template-name>
replicas: 3
version: <control-plane-kubernetes-version>
后续步骤
创建集群后: