在 global 集群中创建 VMware vSphere 集群

本文说明如何通过直接连接 vCenter 的标准 CAPV 模式,从 global 集群创建一个 VMware vSphere 业务集群。该操作步骤覆盖一个最小支持拓扑:一个 datacenter、每个节点一个 NIC,以及通过 VSphereMachineConfigPool 进行静态 IP 分配。

场景

在以下场景中使用本文档:

  • 你希望在环境中创建第一个基线 VMware vSphere 业务集群。
  • 你在初始验证中使用一个 datacenter 和每个节点一个 NIC。
  • 你希望在启用高级放置或网络功能之前,先保持首次部署尽可能简单。

本文档适用于以下部署模型:

  • CAPV 直接连接到 vCenter。
  • 控制平面和 worker 节点都使用 VSphereMachineConfigPool 进行静态 IP 分配和数据盘配置。
  • ClusterResourceSet 会自动下发 vSphere CPI 组件。
  • 首次验证使用一个 datacenter 和每个节点一个 NIC。

本文档不适用于以下场景:

  • 依赖 vSphere Supervisor 或 vm-operator 的部署。
  • 不使用 VSphereMachineConfigPool 的部署。
  • 首次部署时同时启用多个 datacenter、多个 NIC 和复杂磁盘扩展。

本文档针对当前平台环境编写。kube-ovn 的交付路径依赖于消费 Cluster 资源注解的平台控制器,因此该工作流不适用于平台上下文之外的通用独立 CAPV 部署指南。

前提条件

开始之前,请确保满足以下条件:

  1. 你已完成 Preparing Parameters for a VMware vSphere Cluster 中的值收集。
  2. global 集群可以访问 vCenter。
  3. 目标模板、网络、datastore 和 vCenter resource pool 可用。
  4. 控制平面 VIP 和负载均衡器已准备就绪。
  5. 所有必需的静态 IP 地址都已分配且未被使用。
  6. 已启用 ClusterResourceSet=true
  7. 平台已具备有效的公共 registry 配置。
  8. 平台可以处理安装网络插件所需的集群注解。

关键对象

ClusterResourceSet

ClusterResourceSet 是管理集群中的一个 Cluster API 资源。工作负载 API server 可达后,它会将引用的 ConfigMapSecret 资源应用到业务集群。

在该工作流中,ClusterResourceSet 用于自动下发 vSphere CPI 资源。

vSphere CPI component

vSphere CPI component 通过 ClusterResourceSet 下发到业务集群。它将业务节点连接到 vSphere 基础设施,使集群能够报告基础设施标识并完成 cloud-provider 初始化。

machine config pool

machine config pool 即 VSphereMachineConfigPool 自定义资源。在基线工作流中:

  • 一个 machine config pool 用于控制平面节点。
  • 一个 machine config pool 用于 worker 节点。

每个节点槽位都包含 hostname、datacenter、静态 IP 分配以及可选的数据盘定义。

对于网络配置,需要区分以下字段:

  • networkName 是 vCenter 网络或 port group 名称。
  • deviceName 是 guest operating system 内的 NIC 名称。

如果设置了 deviceName,CAPV 会将该值写入生成的 guest-network metadata 中。如果省略该字段,当前实现通常会按 NIC 顺序使用 eth0eth1eth2 之类的 NIC 名称。

还需要区分以下值格式:

  • 节点 IP 地址需要与前缀长度一起使用,例如 10.10.10.11/24
  • gateway 字段仅包含 gateway IP 地址,例如 10.10.10.1

在基线工作流中:

  • 一个 VSphereMachineConfigPool 用于控制平面节点。
  • 一个 VSphereMachineConfigPool 用于 worker 节点。

VM 模板要求

该工作流使用的 VM 模板应满足以下最低要求:

  1. 它使用目标平台环境所需的 operating system。
  2. 它包含 cloud-init
  3. 它包含 VMware Tools 或 open-vm-tools
  4. 它包含 containerd
  5. 它包含 kubeadm bootstrap 所需的基础组件。
  6. 它在 /root/images/ 下包含预先导出的 container image tar 文件。这些文件会在 kubeadm 运行前由 capv-load-local-images.sh 导入到 containerd 中,从而使节点 bootstrap 不依赖从远程 registry 拉取镜像。
  7. /root/images/*.tar 文件必须包含 sandbox (pause) image,且其 reference 必须与 /etc/containerd/config.toml 中配置的 sandbox_image 值(containerd v1)或 sandbox 值(containerd v2)完全匹配。例如,如果 containerd 配置为 sandbox_image = "registry.example.com/tkestack/pause:3.10",则其中一个 tar 文件必须包含该完全一致的 image reference。不匹配会导致 containerd 从网络拉取 sandbox image,这会破坏本地预加载的目的,并在 air-gapped 环境中失败。

静态 IP 配置、hostname 注入和其他初始化设置依赖 cloud-init。节点 IP 上报依赖 guest tools。

本地文件布局

创建本地工作目录,并按以下布局保存 manifests:

capv-cluster/
├── 00-namespace.yaml
├── 01-vsphere-credentials-secret.yaml
├── 02-vspheremachineconfigpool-control-plane.yaml
├── 03-vspheremachineconfigpool-worker.yaml
├── 10-cluster.yaml
├── 15-vsphere-cpi-clusterresourceset.yaml
├── 20-control-plane.yaml
└── 30-workers-md-0.yaml

使用以下命令创建目录:

mkdir -p ./capv-cluster
cd ./capv-cluster

操作步骤

验证环境

从管理环境运行以下命令,以验证最低前提条件:

kubectl get ns
kubectl get minfo -l cpaas.io/module-name=cluster-api-provider-vsphere
kubectl get minfo -l cpaas.io/module-name=cluster-api-provider-kubeadm
kubectl -n cpaas-system get deploy capi-controller-manager -o jsonpath='{.spec.template.spec.containers[0].args}'
kubectl -n cpaas-system get secret public-registry-credential -o jsonpath='{.data.content}'

确认以下结果:

  • 管理集群可访问。
  • Alauda Container Platform Kubeadm Provider 和 Alauda Container Platform VMware vSphere Infrastructure Provider 正在运行。
  • controller 参数中包含 ClusterResourceSet=true
  • 公共 registry 凭证 data.content 不为空。

继续之前,还要检查以下项目:

  • vCenter server 地址可达。
  • vCenter 用户名和密码有效。
  • thumbprint 正确。
  • 模板名称正确。
  • 目标 datacenter 中可以解析该模板。
  • 如果 VM 以 fullClone 方式克隆,则模板系统盘大小不能大于后续 manifests 中使用的 diskGiB 值。如果 CAPV 完成 linkedClone,系统盘大小将保持为模板大小,而 diskGiB 会被忽略。
  • 模板中已安装 VMware Tools 或 open-vm-tools
  • VIP 存在,且执行环境可以访问端口 6443
  • 用于 real-server 维护的负载均衡器归属模型已经明确。

创建 namespace 和 vCenter 凭证 Secret

创建用于存储业务集群对象的 namespace。

该工作流将业务集群对象存储在 cpaas-system namespace 中。在下面的 manifests 和命令中,请将所有 <namespace> 占位符替换为 cpaas-system

00-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: <namespace>

创建由 VSphereCluster.spec.identityRef 引用的 vCenter 凭证 Secret。

01-vsphere-credentials-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: <credentials_secret_name>
  namespace: <namespace>
type: Opaque
stringData:
  username: "<vsphere_username>"
  password: "<vsphere_password>"

应用这两个 manifest:

kubectl apply -f 00-namespace.yaml
kubectl apply -f 01-vsphere-credentials-secret.yaml

创建 ClusterVSphereCluster 对象

创建基础 cluster manifest,包含业务集群网络设置、控制平面 endpoint 以及 vCenter 连接设置。

10-cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: <cluster_name>
  namespace: <namespace>
  labels:
    cluster.x-k8s.io/cluster-name: <cluster_name>
    cluster-type: VSphere
    addons.cluster.x-k8s.io/vsphere-cpi: "enabled"
  annotations:
    capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
    capi.cpaas.io/resource-kind: VSphereCluster
    cpaas.io/sentry-deploy-type: Baremetal
    cpaas.io/alb-address-type: ClusterAddress
    cpaas.io/network-type: kube-ovn
    cpaas.io/kube-ovn-version: <kube_ovn_version>
    cpaas.io/kube-ovn-join-cidr: <kube_ovn_join_cidr>
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - <pod_cidr>
    services:
      cidrBlocks:
      - <service_cidr>
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: <cluster_name>
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: VSphereCluster
    name: <cluster_name>
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
  name: <cluster_name>
  namespace: <namespace>
spec:
  controlPlaneEndpoint:
    host: "<vip>"
    port: <api_server_port>
  identityRef:
    kind: Secret
    name: <credentials_secret_name>
  server: "<vsphere_server>"
  thumbprint: "<thumbprint>"

应用该 manifest:

kubectl apply -f 10-cluster.yaml

创建 vSphere CPI 下发资源

创建一个 ClusterResourceSet,使业务集群在 workload API server 可达后自动接收 vSphere CPI 配置和 manifests。

WARNING

CPI ConfigMapSecretClusterResourceSet 资源必须创建在与 Cluster 资源相同的 namespace 中。在本指南中,该 namespace 是 cpaas-systemClusterResourceSet 只能匹配其自身 namespace 内的集群;如果部署到不同 namespace,将会静默地阻止资源下发。

INFO

Cluster 注解中的 kube-ovn 配置由平台控制器消费。本文档不会直接安装网络插件。

TIP

该 manifest 很长,并且在 data 字段中包含嵌套 YAML。应用前请先校验 manifest:kubectl apply --dry-run=client -f 15-vsphere-cpi-clusterresourceset.yaml

15-vsphere-cpi-clusterresourceset.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: <cluster_name>-vsphere-cpi-config
  namespace: <namespace>
data:
  data: |
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cloud-config
      namespace: kube-system
    data:
      vsphere.conf: |
        [Global]
        secret-name = "vsphere-cloud-secret"
        secret-namespace = "kube-system"
        service-account = "cloud-controller-manager"
        port = "443"
        insecure-flag = "<cpi_insecure_flag>"
        datacenters = "<cpi_datacenters>"

        [Labels]
        zone = "k8s-zone"
        region = "k8s-region"

        [VirtualCenter "<vsphere_server>"]
---
apiVersion: v1
kind: Secret
metadata:
  name: <cluster_name>-vsphere-cpi-secret
  namespace: <namespace>
type: addons.cluster.x-k8s.io/resource-set
stringData:
  data: |
    apiVersion: v1
    kind: Secret
    metadata:
      name: vsphere-cloud-secret
      namespace: kube-system
    type: Opaque
    stringData:
      <vsphere_server>.username: <vsphere_username>
      <vsphere_server>.password: <vsphere_password>
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: <cluster_name>-vsphere-cpi-manifests
  namespace: <namespace>
data:
  data: |
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: cloud-controller-manager
      namespace: kube-system
    automountServiceAccountToken: false
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: system:cloud-controller-manager
    rules:
    - apiGroups: [""]
      resources: ["events"]
      verbs: ["create", "patch", "update"]
    - apiGroups: [""]
      resources: ["nodes"]
      verbs: ["*"]
    - apiGroups: [""]
      resources: ["nodes/status"]
      verbs: ["patch"]
    - apiGroups: [""]
      resources: ["services"]
      verbs: ["list", "patch", "update", "watch"]
    - apiGroups: [""]
      resources: ["services/status"]
      verbs: ["patch"]
    - apiGroups: [""]
      resources: ["serviceaccounts"]
      verbs: ["create", "get", "list", "watch", "update"]
    - apiGroups: [""]
      resources: ["persistentvolumes"]
      verbs: ["get", "list", "update", "watch"]
    - apiGroups: [""]
      resources: ["endpoints"]
      verbs: ["create", "get", "list", "watch", "update"]
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["coordination.k8s.io"]
      resources: ["leases"]
      verbs: ["get", "list", "watch", "create", "update"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: servicecatalog.k8s.io:apiserver-authentication-reader
      namespace: kube-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: extension-apiserver-authentication-reader
    subjects:
    - apiGroup: ""
      kind: ServiceAccount
      name: cloud-controller-manager
      namespace: kube-system
    - apiGroup: ""
      kind: User
      name: cloud-controller-manager
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: system:cloud-controller-manager
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:cloud-controller-manager
    subjects:
    - kind: ServiceAccount
      name: cloud-controller-manager
      namespace: kube-system
    - kind: User
      name: cloud-controller-manager
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        component: cloud-controller-manager
        tier: control-plane
        k8s-app: vsphere-cloud-controller-manager
      name: vsphere-cloud-controller-manager
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          k8s-app: vsphere-cloud-controller-manager
      updateStrategy:
        type: RollingUpdate
      template:
        metadata:
          labels:
            component: cloud-controller-manager
            k8s-app: vsphere-cloud-controller-manager
        spec:
          securityContext:
            runAsUser: 1001
          automountServiceAccountToken: true
          # Optional: required when the CPI image is stored in a private
          # registry that needs authentication. The platform automatically
          # syncs a dockerconfigjson secret named "global-registry-auth"
          # into every namespace of the workload cluster when the
          # management-cluster secret "public-registry-credential"
          # (data.content) is configured. If your environment does not
          # use a private registry, remove the imagePullSecrets block.
          imagePullSecrets:
          - name: global-registry-auth
          serviceAccountName: cloud-controller-manager
          hostNetwork: true
          tolerations:
          - operator: Exists
          - key: node.cloudprovider.kubernetes.io/uninitialized
            value: "true"
            effect: NoSchedule
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          - key: node.kubernetes.io/not-ready
            effect: NoSchedule
            operator: Exists
          containers:
          - name: vsphere-cloud-controller-manager
            image: <image_registry>/ait/cloud-provider-vsphere:<cpi_image_tag>
            args:
            - --v=2
            - --cloud-provider=vsphere
            - --cloud-config=/etc/cloud/vsphere.conf
            volumeMounts:
            - mountPath: /etc/cloud
              name: vsphere-config-volume
              readOnly: true
            resources:
              requests:
                cpu: 200m
          volumes:
          - name: vsphere-config-volume
            configMap:
              name: cloud-config
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        component: cloud-controller-manager
      name: vsphere-cloud-controller-manager
      namespace: kube-system
    spec:
      type: NodePort
      ports:
      - port: 43001
        protocol: TCP
        targetPort: 43001
      selector:
        component: cloud-controller-manager
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
  name: <cluster_name>-vsphere-cpi
  namespace: <namespace>
spec:
  strategy: Reconcile
  clusterSelector:
    matchLabels:
      addons.cluster.x-k8s.io/vsphere-cpi: "enabled"
  resources:
  - name: <cluster_name>-vsphere-cpi-config
    kind: ConfigMap
  - name: <cluster_name>-vsphere-cpi-secret
    kind: Secret
  - name: <cluster_name>-vsphere-cpi-manifests
    kind: ConfigMap

应用该 manifest:

kubectl apply -f 15-vsphere-cpi-clusterresourceset.yaml

创建 machine config pools

创建控制平面 machine config pool。

INFO

每个节点槽位在 network.primary(必填)和 network.additional(可选列表)下声明其 NIC 布局。主 NIC 的 networkName 是必填项,provider 会根据 hostname 和解析后的主 NIC 地址推导 Kubernetes 节点名、kubelet serving certificate DNS SAN,以及 kubelet node-iphostname 必须是合法的 DNS-1123 子域名。

INFO

deviceName 是可选项。如果不需要强制指定 guest NIC 名称,可以从每个节点槽位中移除 deviceName 行。provider 会按 NIC 顺序分配 eth0eth1 等 NIC 名称。

02-vspheremachineconfigpool-control-plane.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineConfigPool
metadata:
  name: <cp_pool_name>
  namespace: <namespace>
spec:
  clusterRef:
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: <cluster_name>
  datacenter: "<default_datacenter>"
  releaseDelayHours: <release_delay_hours>
  configs:
  - hostname: "<cp_node_name_1>"
    datacenter: "<master_01_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<master_01_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <cp_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <cp_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4
    - name: var-lib-etcd
      sizeGiB: <cp_var_lib_etcd_size_gib>
      mountPath: /var/lib/etcd
      fsFormat: ext4
      wipeFilesystem: true
  - hostname: "<cp_node_name_2>"
    datacenter: "<master_02_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<master_02_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <cp_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <cp_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4
    - name: var-lib-etcd
      sizeGiB: <cp_var_lib_etcd_size_gib>
      mountPath: /var/lib/etcd
      fsFormat: ext4
      wipeFilesystem: true
  - hostname: "<cp_node_name_3>"
    datacenter: "<master_03_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<master_03_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <cp_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <cp_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4
    - name: var-lib-etcd
      sizeGiB: <cp_var_etcd_size_gib>
      mountPath: /var/lib/etcd
      fsFormat: ext4
      wipeFilesystem: true

创建 worker machine config pool。

03-vspheremachineconfigpool-worker.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineConfigPool
metadata:
  name: <worker_pool_name>
  namespace: <namespace>
spec:
  clusterRef:
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: <cluster_name>
  datacenter: "<default_datacenter>"
  releaseDelayHours: <release_delay_hours>
  configs:
  - hostname: "<worker_node_name_1>"
    datacenter: "<worker_01_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<worker_01_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <worker_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <worker_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4

应用这两个 manifest:

kubectl apply -f 02-vspheremachineconfigpool-control-plane.yaml
kubectl apply -f 03-vspheremachineconfigpool-worker.yaml

创建控制平面对象

创建 VSphereMachineTemplateKubeadmControlPlane 对象。请将下面完整模板中的占位符替换为检查清单文档中收集到的值。

cloneModediskGiB 在模板中都会保留,因为 CAPV 接受这两个字段。实际使用中,diskGiB 仅在真实克隆操作为 fullClone 时影响系统盘。如果 cloneModelinkedClone 且模板存在可用 snapshot,CAPV 会完成 linked clone,系统盘大小保持为源模板大小。如果不存在可用 snapshot,CAPV 会回退到 fullClone,此时 diskGiB 再次生效。

20-control-plane.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: <cluster_name>-control-plane
  namespace: <namespace>
spec:
  template:
    spec:
      server: "<vsphere_server>"
      template: "<template_name>"
      cloneMode: <clone_mode>
      datastore: "<cp_system_datastore>"
      diskGiB: <cp_system_disk_gib>
      memoryMiB: <cp_memory_mib>
      numCPUs: <cp_num_cpus>
      os: Linux
      powerOffMode: <power_off_mode>
      network:
        devices:
        - networkName: "<nic1_network_name>"
      machineConfigPoolRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineConfigPool
        name: <cp_pool_name>
        namespace: <namespace>
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: <cluster_name>
  namespace: <namespace>
spec:
  rolloutStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
  version: "<k8s_version>"
  replicas: <cp_replicas>
  machineTemplate:
    nodeDrainTimeout: 1m
    nodeDeletionTimeout: 5m
    metadata:
      labels:
        node-role.kubernetes.io/control-plane: ""
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: VSphereMachineTemplate
      name: <cluster_name>-control-plane
  kubeadmConfigSpec:
    users:
    - name: boot
      sudo: ALL=(ALL) NOPASSWD:ALL
      sshAuthorizedKeys:
      - "<ssh_public_key>"
    files:
    - path: /etc/kubernetes/admission/psa-config.yaml
      owner: "root:root"
      permissions: "0644"
      content: |
        apiVersion: apiserver.config.k8s.io/v1
        kind: AdmissionConfiguration
        plugins:
        - name: PodSecurity
          configuration:
            apiVersion: pod-security.admission.config.k8s.io/v1
            kind: PodSecurityConfiguration
            defaults:
              enforce: "privileged"
              enforce-version: "latest"
              audit: "baseline"
              audit-version: "latest"
              warn: "baseline"
              warn-version: "latest"
            exemptions:
              usernames: []
              runtimeClasses: []
              namespaces:
              - kube-system
              - <namespace>
    - path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
      owner: "root:root"
      permissions: "0644"
      content: |
        {
          "apiVersion": "kubelet.config.k8s.io/v1beta1",
          "kind": "KubeletConfiguration",
          "protectKernelDefaults": true,
          "streamingConnectionIdleTimeout": "5m",
          "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
          "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key"
        }
    # Generate the encryption key with: head -c 32 /dev/urandom | base64
    - path: /etc/kubernetes/encryption-provider.conf
      owner: "root:root"
      append: false
      permissions: "0644"
      content: |
        apiVersion: apiserver.config.k8s.io/v1
        kind: EncryptionConfiguration
        resources:
        - resources:
          - secrets
          providers:
          - aescbc:
              keys:
              - name: key1
                secret: <encryption_provider_secret>
    - path: /etc/kubernetes/audit/policy.yaml
      owner: "root:root"
      append: false
      permissions: "0644"
      content: |
        apiVersion: audit.k8s.io/v1
        kind: Policy
        omitStages:
        - "RequestReceived"
        rules:
        - level: None
          users:
          - system:kube-controller-manager
          - system:kube-scheduler
          - system:serviceaccount:kube-system:endpoint-controller
          verbs: ["get", "update"]
          namespaces: ["kube-system"]
          resources:
          - group: ""
            resources: ["endpoints"]
        - level: None
          nonResourceURLs:
          - /healthz*
          - /version
          - /swagger*
        - level: None
          resources:
          - group: ""
            resources: ["events"]
        - level: None
          resources:
          - group: "devops.alauda.io"
        - level: None
          verbs: ["get", "list", "watch"]
        - level: None
          resources:
          - group: "coordination.k8s.io"
            resources: ["leases"]
        - level: None
          resources:
          - group: "authorization.k8s.io"
            resources: ["subjectaccessreviews", "selfsubjectaccessreviews"]
          - group: "authentication.k8s.io"
            resources: ["tokenreviews"]
        - level: None
          resources:
          - group: "app.alauda.io"
            resources: ["imagewhitelists"]
          - group: "k8s.io"
            resources: ["namespaceoverviews"]
        - level: Metadata
          resources:
          - group: ""
            resources: ["secrets", "configmaps"]
        - level: Metadata
          resources:
          - group: "operator.connectors.alauda.io"
            resources: ["installmanifests"]
          - group: "operators.katanomi.dev"
            resources: ["katanomis"]
        - level: RequestResponse
          resources:
          - group: ""
          - group: "aiops.alauda.io"
          - group: "apps"
          - group: "app.k8s.io"
          - group: "authentication.istio.io"
          - group: "auth.alauda.io"
          - group: "autoscaling"
          - group: "asm.alauda.io"
          - group: "clusterregistry.k8s.io"
          - group: "crd.alauda.io"
          - group: "infrastructure.alauda.io"
          - group: "monitoring.coreos.com"
          - group: "operators.coreos.com"
          - group: "networking.istio.io"
          - group: "extensions.istio.io"
          - group: "install.istio.io"
          - group: "security.istio.io"
          - group: "telemetry.istio.io"
          - group: "opentelemetry.io"
          - group: "networking.k8s.io"
          - group: "portal.alauda.io"
          - group: "rbac.authorization.k8s.io"
          - group: "storage.k8s.io"
          - group: "tke.cloud.tencent.com"
          - group: "devopsx.alauda.io"
          - group: "core.katanomi.dev"
          - group: "deliveries.katanomi.dev"
          - group: "integrations.katanomi.dev"
          - group: "artifacts.katanomi.dev"
          - group: "builds.katanomi.dev"
          - group: "versioning.katanomi.dev"
          - group: "sources.katanomi.dev"
          - group: "tekton.dev"
          - group: "operator.tekton.dev"
          - group: "eventing.knative.dev"
          - group: "flows.knative.dev"
          - group: "messaging.knative.dev"
          - group: "operator.knative.dev"
          - group: "sources.knative.dev"
          - group: "operator.devops.alauda.io"
          - group: "flagger.app"
          - group: "jaegertracing.io"
          - group: "velero.io"
            resources: ["deletebackuprequests"]
          - group: "connectors.alauda.io"
          - group: "operator.connectors.alauda.io"
            resources: ["connectorscores", "connectorsgits", "connectorsocis"]
        - level: Metadata
    - path: /usr/local/bin/capv-load-local-images.sh
      owner: "root:root"
      permissions: "0755"
      content: |
        #!/bin/bash
        set -euo pipefail
        until mountpoint -q /var/lib/containerd; do
          echo "waiting for /var/lib/containerd mount"
          sleep 1
        done
        systemctl restart containerd
        until systemctl is-active --quiet containerd; do
          echo "waiting for containerd"
          sleep 1
        done
        if [ ! -d "/root/images" ]; then
          echo "ERROR: /root/images directory not found" >&2
          exit 1
        fi
        image_count=0
        for image_file in /root/images/*.tar; do
          if [ -f "$image_file" ]; then
            echo "importing image: $image_file"
            ctr -n k8s.io images import "$image_file"
            image_count=$((image_count + 1))
          fi
        done
        if [ "$image_count" -eq 0 ]; then
          echo "ERROR: no tar files found in /root/images" >&2
          exit 1
        fi
        echo "imported $image_count images"
    preKubeadmCommands:
    - hostnamectl set-hostname "{{ ds.meta_data.hostname }}"
    - echo "::1         ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6" >/etc/hosts
    - echo "127.0.0.1   {{ ds.meta_data.hostname }} {{ local_hostname }} localhost localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
    - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
    - /usr/local/bin/capv-load-local-images.sh
    postKubeadmCommands:
    - chmod 600 /var/lib/kubelet/config.yaml
    clusterConfiguration:
      imageRepository: <image_registry>/tkestack
      dns:
        imageTag: <dns_image_tag>
      etcd:
        local:
          imageTag: <etcd_image_tag>
      apiServer:
        extraArgs:
          admission-control-config-file: /etc/kubernetes/admission/psa-config.yaml
          audit-log-format: json
          audit-log-maxage: "30"
          audit-log-maxbackup: "10"
          audit-log-maxsize: "200"
          audit-log-mode: batch
          audit-log-path: /etc/kubernetes/audit/audit.log
          audit-policy-file: /etc/kubernetes/audit/policy.yaml
          encryption-provider-config: /etc/kubernetes/encryption-provider.conf
          kubelet-certificate-authority: /etc/kubernetes/pki/ca.crt
          profiling: "false"
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
          tls-min-version: VersionTLS12
        extraVolumes:
        - hostPath: /etc/kubernetes
          mountPath: /etc/kubernetes
          name: vol-dir-0
          pathType: Directory
      controllerManager:
        extraArgs:
          bind-address: "::"
          cloud-provider: external
          profiling: "false"
          tls-min-version: VersionTLS12
      scheduler:
        extraArgs:
          bind-address: "::"
          profiling: "false"
          tls-min-version: VersionTLS12
    initConfiguration:
      nodeRegistration:
        criSocket: /var/run/containerd/containerd.sock
        ignorePreflightErrors:
        - ImagePull
        kubeletExtraArgs:
          cloud-provider: external
          node-labels: kube-ovn/role=master
        name: '{{ local_hostname }}'
      patches:
        directory: /etc/kubernetes/patches
    joinConfiguration:
      nodeRegistration:
        criSocket: /var/run/containerd/containerd.sock
        ignorePreflightErrors:
        - ImagePull
        kubeletExtraArgs:
          cloud-provider: external
          node-labels: kube-ovn/role=master
          volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
        name: '{{ local_hostname }}'
      patches:
        directory: /etc/kubernetes/patches

应用该 manifest:

kubectl apply -f 20-control-plane.yaml

创建 worker 对象

创建 worker machine template、bootstrap template 和 MachineDeployment

30-workers-md-0.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: <cluster_name>-worker
  namespace: <namespace>
spec:
  template:
    spec:
      server: "<vsphere_server>"
      template: "<template_name>"
      cloneMode: <clone_mode>
      datastore: "<worker_system_datastore>"
      diskGiB: <worker_system_disk_gib>
      memoryMiB: <worker_memory_mib>
      numCPUs: <worker_num_cpus>
      os: Linux
      powerOffMode: <power_off_mode>
      network:
        devices:
        - networkName: "<nic1_network_name>"
      machineConfigPoolRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineConfigPool
        name: <worker_pool_name>
        namespace: <namespace>
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: <cluster_name>-worker-bootstrap
  namespace: <namespace>
spec:
  template:
    spec:
      files:
      - path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
        owner: "root:root"
        permissions: "0644"
        content: |
          {
            "apiVersion": "kubelet.config.k8s.io/v1beta1",
            "kind": "KubeletConfiguration",
            "protectKernelDefaults": true,
            "staticPodPath": null,
            "streamingConnectionIdleTimeout": "5m",
            "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
            "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key"
          }
      - path: /usr/local/bin/capv-load-local-images.sh
        owner: "root:root"
        permissions: "0755"
        content: |
          #!/bin/bash
          set -euo pipefail
          until mountpoint -q /var/lib/containerd; do
            echo "waiting for /var/lib/containerd mount"
            sleep 1
          done
          systemctl restart containerd
          until systemctl is-active --quiet containerd; do
            echo "waiting for containerd"
            sleep 1
          done
          if [ ! -d "/root/images" ]; then
            echo "ERROR: /root/images directory not found" >&2
            exit 1
          fi
          image_count=0
          for image_file in /root/images/*.tar; do
            if [ -f "$image_file" ]; then
              echo "importing image: $image_file"
              ctr -n k8s.io images import "$image_file"
              image_count=$((image_count + 1))
            fi
          done
          if [ "$image_count" -eq 0 ]; then
            echo "ERROR: no tar files found in /root/images" >&2
            exit 1
          fi
          echo "imported $image_count images"
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          ignorePreflightErrors:
          - ImagePull
          kubeletExtraArgs:
            cloud-provider: external
            volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
          name: '{{ local_hostname }}'
        patches:
          directory: /etc/kubernetes/patches
      preKubeadmCommands:
      - hostnamectl set-hostname "{{ ds.meta_data.hostname }}"
      - echo "::1         ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6" >/etc/hosts
      - echo "127.0.0.1   {{ ds.meta_data.hostname }} {{ local_hostname }} localhost localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
      - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
      - /usr/local/bin/capv-load-local-images.sh
      postKubeadmCommands:
      - chmod 600 /var/lib/kubelet/config.yaml
      users:
      - name: boot
        sudo: ALL=(ALL) NOPASSWD:ALL
        sshAuthorizedKeys:
        - "<ssh_public_key>"
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: <cluster_name>-md-0
  namespace: <namespace>
spec:
  clusterName: <cluster_name>
  replicas: <worker_replicas>
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  selector:
    matchLabels:
      nodepool: md-0
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: <cluster_name>
        nodepool: md-0
    spec:
      clusterName: <cluster_name>
      version: "<k8s_version>"
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: <cluster_name>-worker-bootstrap
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineTemplate
        name: <cluster_name>-worker

应用该 manifest:

kubectl apply -f 30-workers-md-0.yaml

在基线工作流中,请注意以下 worker 相关规则:

  • 主 worker manifest 中默认不设置 failureDomain,因为基线工作流假设只有一个 datacenter。如果你需要让 worker MachineDeployment 落在特定的 VSphereDeploymentZone 中,请按照 Extension Scenarios 中的说明添加 failureDomain
  • 某些环境会在 KubeadmConfigTemplate 中额外添加 runtime-image 替换命令或 service 重启命令。基线示例中故意不包含这些命令。仅当你所在环境的平台要求明确需要时再添加。

等待集群就绪

在所有 manifests 应用后,集群创建是异步的。使用以下命令监控进度:

kubectl -n <namespace> get cluster,kubeadmcontrolplane,machinedeployment,machine -w

在继续验证之前,请等待 KubeadmControlPlane 报告的 ready replicas 数量达到预期,并且所有 Machine 对象都进入 Running 阶段。

验证

使用以下命令验证集群创建工作流。

  1. 检查管理集群中的 CPI 下发资源:
    kubectl -n <namespace> get clusterresourceset
    kubectl -n <namespace> get clusterresourcesetbinding
  2. 导出业务集群 kubeconfig:
    kubectl -n <namespace> get secret <cluster_name>-kubeconfig -o jsonpath='{.data.value}' | base64 -d > /tmp/<cluster_name>.kubeconfig
  3. 检查业务集群中是否创建了 vSphere CPI daemonset:
    kubectl --kubeconfig=/tmp/<cluster_name>.kubeconfig -n kube-system get daemonset
  4. 检查管理集群对象:
    kubectl -n <namespace> get cluster,vspherecluster,kubeadmcontrolplane,machinedeployment,machine,vspheremachine,vspherevm
  5. 检查业务集群节点:
    kubectl --kubeconfig=/tmp/<cluster_name>.kubeconfig get nodes -o wide

确认以下结果:

  • vsphere-cloud-controller-manager 出现在业务集群中。
  • 控制平面节点和 worker 节点已创建。
  • 节点最终变为 Ready

故障排查

当工作流失败时,优先使用以下命令:

kubectl -n <namespace> describe cluster <cluster_name>
kubectl -n <namespace> describe vspherecluster <cluster_name>
kubectl -n <namespace> describe kubeadmcontrolplane <cluster_name>
kubectl -n <namespace> describe machinedeployment <cluster_name>-md-0
kubectl -n <namespace> get cluster,vspherecluster,kubeadmcontrolplane,machinedeployment,machine,vspheremachine,vspherevm
kubectl -n cpaas-system logs deploy/capi-controller-manager

优先检查以下内容:

  • 如果 CPI 资源未下发,请检查 ClusterResourceSet=trueClusterResourceSetClusterResourceSetBinding
  • 如果 ClusterResourceSet 存在但没有创建 ClusterResourceSetBinding,请检查 controller 是否对引用的 ConfigMapSecret 资源具有所需的 delete 权限。
  • 如果网络插件未安装,请检查所需的集群注解是否存在,以及平台控制器是否已处理这些注解。
  • 如果缺少 cpaas.io/registry-address 注解,请检查公共 registry 凭证以及负责注入该注解的平台 controller。
  • 如果某个 machine 一直停留在 Provisioning,请检查 VSphereMachineMachineConfigPoolReady condition——它会显示是否由于 pool 绑定或 datacenter 不匹配导致槽位分配失败。
  • 如果 VM 一直等待 IP 分配,请检查 VMware Tools、静态 IP 设置以及 VSphereVM.status.addresses
  • 如果 datastore 空间耗尽,请检查目标 datastore 中是否仍保留旧的 VM 目录或 .vmdk 文件。
  • 如果模板系统盘大小与 manifest 值不一致,请先检查实际的 clone mode。当 VM 以 linkedClone 创建时,系统盘保持为模板大小,diskGiB 会被忽略。只有 fullClone 才会使用 diskGiB,并且此时 diskGiB 不能小于模板磁盘大小。
  • 如果控制平面 endpoint 未起来,请检查负载均衡器、VIP 和端口 6443
  • 如果连接 vCenter 的 TLS 失败,请检查 thumbprint、vCenter 地址,以及 proxy 设置是否干扰连接。

查看 controller 日志时,请遵循以下规则:

  • deploy/capi-controller-manager 运行在 global 集群的 cpaas-system namespace 中。
  • 不要使用业务集群 kubeconfig 来查看 capi-controller-manager 日志。
  • 如果平台控制器处理了集群网络注解,还要检查平台 network-controller 日志和平台 cluster-lifecycle-controller 日志。

下一步

在基线拓扑运行后,如果你需要第二个 NIC、多个 datacenter、failure domain、额外数据盘或更多 worker 副本,请继续参考 Extension Scenarios