在 global 集群中创建 VMware vSphere 集群

本文说明如何使用直接连接 vCenter 的标准 CAPV 模式,从 global 集群创建一个 VMware vSphere 业务集群。该操作步骤涵盖一个受支持的最小拓扑:一个 datacenter、每个节点一个 NIC,以及通过 VSphereMachineConfigPool 实现的静态 IP 分配。

场景

在以下场景中使用本文档:

  • 你希望在环境中创建第一个基础 VMware vSphere 业务集群。
  • 你在初始验证中使用一个 datacenter 和每个节点一个 NIC。
  • 在启用高级放置或网络功能之前,你希望先保持首次部署尽可能简单。

本文档适用于以下部署模型:

  • CAPV 直接连接到 vCenter。
  • 控制平面节点和 worker 节点都使用 VSphereMachineConfigPool 进行静态 IP 分配和数据盘配置。
  • ClusterResourceSet 会自动交付 vSphere CPI 组件。
  • 首次验证使用一个 datacenter 和每个节点一个 NIC。

本文档不适用于以下场景:

  • 依赖 vSphere Supervisor 或 vm-operator 的部署。
  • 不使用 VSphereMachineConfigPool 的部署。
  • 首次部署时同时启用多个 datacenter、多个 NIC 和复杂磁盘扩展的场景。

本文档针对当前平台环境编写。kube-ovn 的交付路径依赖于消费 Cluster 资源注解的平台控制器,因此该工作流并不打算作为平台上下文之外的通用独立 CAPV 部署指南。

前提条件

在开始之前,请确保满足以下条件:

  1. 你已完成 Preparing Parameters for a VMware vSphere Cluster 中的值收集。
  2. global 集群可以访问 vCenter。
  3. 目标模板、网络、datastore 以及 vCenter 资源池可用。
  4. 控制平面 VIP 和负载均衡器已就绪。
  5. 所有必需的静态 IP 地址都已分配且未被占用。
  6. 已启用 ClusterResourceSet=true
  7. 平台已具备有效的公共镜像仓库配置。
  8. 平台可以处理安装网络插件所需的集群注解。

关键对象

ClusterResourceSet

ClusterResourceSetglobal 集群中的一个 Cluster API 资源。在业务 API server 可达之后,它会将所引用的 ConfigMapSecret 资源应用到业务集群。

在此工作流中,ClusterResourceSet 用于自动交付 vSphere CPI 资源。

vSphere CPI 组件

vSphere CPI 组件通过 ClusterResourceSet 交付到业务集群。它将业务节点连接到 vSphere 基础设施,使集群能够报告基础设施身份并完成 cloud-provider 初始化。

machine config pool

machine config pool 即 VSphereMachineConfigPool 自定义资源。在基础工作流中:

  • 一个 machine config pool 用于控制平面节点。
  • 一个 machine config pool 用于 worker 节点。

每个节点槽位都包含 hostname、datacenter、静态 IP 分配以及可选的数据盘定义。

对于网络配置,请区分以下字段:

  • networkName 是 vCenter 网络或 port group 名称。
  • deviceName 是 guest operating system 内部的 NIC 名称。

如果设置了 deviceName,CAPV 会将该值写入生成的 guest-network 元数据中。如果省略它,当前实现通常会按 NIC 顺序使用 eth0eth1eth2 之类的 NIC 名称。

还要区分以下值格式:

  • 节点 IP 地址与前缀长度一起使用,例如 10.10.10.11/24
  • gateway 字段只包含 gateway IP 地址,例如 10.10.10.1

在基础工作流中:

  • 一个 VSphereMachineConfigPool 用于控制平面节点。
  • 一个 VSphereMachineConfigPool 用于 worker 节点。

VM 模板要求

该工作流使用的 VM 模板应满足以下最低要求:

  1. 它使用目标平台环境所需的操作系统。
  2. 它包含 cloud-init
  3. 它包含 VMware Tools 或 open-vm-tools
  4. 它包含 containerd
  5. 它包含 kubeadm bootstrap 所需的基础组件。
  6. 它在 /root/images/ 下包含预导出的容器镜像 tar 文件。这些文件会在 kubeadm 运行前由 capv-load-local-images.sh 导入到 containerd 中,从而使节点引导不依赖于从远程 registry 拉取镜像。
  7. /root/images/*.tar 文件必须包含 sandbox(pause)镜像,且其引用必须与 /etc/containerd/config.toml 中配置的 sandbox_image 值(containerd v1)或 sandbox 值(containerd v2)完全一致。例如,如果 containerd 配置为 sandbox_image = "registry.example.com/tkestack/pause:3.10",则某个 tar 文件必须包含该完全相同的镜像引用。不匹配会导致 containerd 从网络拉取 sandbox 镜像,这会破坏本地预加载的目的,并在 air-gapped 环境中失败。

静态 IP 配置、hostname 注入以及其他初始化设置都依赖 cloud-init。节点 IP 上报依赖 guest tools。

本地文件布局

业务集群命名

业务 cluster_name 不能global。该名称保留给 global 集群,重复使用会导致业务集群资源与 cpaas-system 中的 global 集群资源发生冲突。global- 前缀保留给 global 集群的 DR 工作流所拥有的资源;请参见 Common Prerequisites。不要将 global- 用于业务集群资源,因为故障切换操作可能会将这些资源视为属于 global 集群而选中它们。

按照约定,CAPI Cluster 和 provider cluster 资源(VSphereCluster)应保持与 <cluster_name> 完全同名,而非根级的 CAPI 和 provider 资源(KubeadmControlPlaneKubeadmConfigTemplateMachineDeploymentVSphereMachineTemplateVSphereMachineConfigPool 等)应以前缀 <cluster_name>- 命名——例如,示例清单使用 <cluster_name>-kcp<cluster_name>-md-0。这是一条建议而非控制器强制规则,但它可以避免多个业务集群同时存在于 cpaas-system 时发生同命名空间冲突,并且在运维过程中更容易看清资源归属。

创建本地工作目录,并按以下布局保存清单:

capv-cluster/
├── 00-namespace.yaml
├── 01-vsphere-credentials-secret.yaml
├── 02-vspheremachineconfigpool-control-plane.yaml
├── 03-vspheremachineconfigpool-worker.yaml
├── 10-cluster.yaml
├── 15-vsphere-cpi-clusterresourceset.yaml
├── 20-control-plane.yaml
└── 30-workers-md-0.yaml

使用以下命令创建目录:

mkdir -p ./capv-cluster
cd ./capv-cluster

操作步骤

验证环境

global 集群运行以下命令,以验证最低前提条件:

kubectl get ns
kubectl get minfo -l cpaas.io/module-name=cluster-api-provider-vsphere
kubectl get minfo -l cpaas.io/module-name=cluster-api-provider-kubeadm
kubectl -n cpaas-system get deploy capi-controller-manager -o jsonpath='{.spec.template.spec.containers[0].args}'
kubectl -n cpaas-system get secret public-registry-credential -o jsonpath='{.data.content}'

确认以下结果:

  • global 集群可达。
  • Alauda Container Platform Kubeadm Provider 和 Alauda Container Platform VMware vSphere Infrastructure Provider 正在运行。
  • controller 参数中包含 ClusterResourceSet=true
  • 公共 registry 凭证 data.content 不为空。

在继续之前,还要验证以下项目:

  • vCenter server 地址可达。
  • vCenter 用户名和密码有效。
  • thumbprint 正确。
  • 模板名称正确。
  • 目标 datacenter 中可以解析该模板。
  • 如果 VM 以 fullClone 方式克隆,则模板系统盘不大于后续清单中使用的 diskGiB 值。如果 CAPV 完成的是 linkedClone,系统盘大小保持模板大小,diskGiB 会被忽略。
  • 模板中已安装 VMware Tools 或 open-vm-tools
  • VIP 已存在,且可从执行环境访问端口 6443
  • 负载均衡器对 real-server 维护的归属模型是明确的。

创建 namespace 和 vCenter 凭证 Secret

创建用于存储业务集群对象的 namespace。

此工作流将业务集群对象存放在 cpaas-system namespace 中。在下面的清单和命令中,请将每个 <namespace> 占位符替换为 cpaas-system

00-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: <namespace>

创建 VSphereCluster.spec.identityRef 引用的 vCenter 凭证 Secret。

01-vsphere-credentials-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: <credentials_secret_name>
  namespace: <namespace>
type: Opaque
stringData:
  username: "<vsphere_username>"
  password: "<vsphere_password>"

应用这两个清单:

kubectl apply -f 00-namespace.yaml
kubectl apply -f 01-vsphere-credentials-secret.yaml

创建 ClusterVSphereCluster 对象

创建基础 cluster 清单,其中包含业务集群网络设置、控制平面 endpoint 以及 vCenter 连接设置。

10-cluster.yaml
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
  name: <cluster_name>
  namespace: <namespace>
  labels:
    cluster.x-k8s.io/cluster-name: <cluster_name>
    cluster-type: VSphere
    addons.cluster.x-k8s.io/vsphere-cpi: "enabled"
  annotations:
    capi.cpaas.io/resource-group-version: infrastructure.cluster.x-k8s.io/v1beta1
    capi.cpaas.io/resource-kind: VSphereCluster
    cpaas.io/sentry-deploy-type: Baremetal
    cpaas.io/alb-address-type: ClusterAddress
    cpaas.io/network-type: kube-ovn
    cpaas.io/kube-ovn-version: <kube_ovn_version>
    cpaas.io/kube-ovn-join-cidr: <kube_ovn_join_cidr>
spec:
  clusterNetwork:
    pods:
      cidrBlocks:
      - <pod_cidr>
    services:
      cidrBlocks:
      - <service_cidr>
  controlPlaneRef:
    apiVersion: controlplane.cluster.x-k8s.io/v1beta1
    kind: KubeadmControlPlane
    name: <cluster_name>-kcp
  infrastructureRef:
    apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
    kind: VSphereCluster
    name: <cluster_name>
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereCluster
metadata:
  name: <cluster_name>
  namespace: <namespace>
spec:
  controlPlaneEndpoint:
    host: "<vip>"
    port: <api_server_port>
  identityRef:
    kind: Secret
    name: <credentials_secret_name>
  server: "<vsphere_server>"
  thumbprint: "<thumbprint>"

应用该清单:

kubectl apply -f 10-cluster.yaml

创建 vSphere CPI 交付资源

创建一个 ClusterResourceSet,使业务集群在业务 API server 可达后自动接收 vSphere CPI 配置和清单。

INFO

在基础工作流中,VSphereCluster.spec.failureDomainSelector 有意不设置,且 CPI vsphere.conf 不包含 [Labels] 区块。这两者仅在启用 failure domain 之后才需要;请按照 Extension Scenarios 中的说明将它们一起配置。如果在没有匹配的 VSphereFailureDomain 对象的情况下向 vsphere.conf 添加 [Labels],会导致 CPI 查找并不存在的 zone 和 region 标签。

WARNING

CPI 的 ConfigMapSecretClusterResourceSet 资源必须创建在与 Cluster 资源相同的 namespace 中。在本指南中,该 namespace 是 cpaas-systemClusterResourceSet 只能匹配其自身 namespace 内的 cluster;如果将其部署到不同的 namespace,将会静默地阻止资源交付。

INFO

Cluster 注解中的 kube-ovn 配置由平台控制器消费。本文档不会直接安装网络插件。

TIP

该清单较长,并且在 data 字段中嵌套了 YAML。应用之前请先校验清单:kubectl apply --dry-run=client -f 15-vsphere-cpi-clusterresourceset.yaml

15-vsphere-cpi-clusterresourceset.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: <cluster_name>-vsphere-cpi-config
  namespace: <namespace>
data:
  data: |
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cloud-config
      namespace: kube-system
    data:
      vsphere.conf: |
        [Global]
        secret-name = "vsphere-cloud-secret"
        secret-namespace = "kube-system"
        service-account = "cloud-controller-manager"
        port = "443"
        insecure-flag = "<cpi_insecure_flag>"
        datacenters = "<cpi_datacenters>"

        [VirtualCenter "<vsphere_server>"]
---
apiVersion: v1
kind: Secret
metadata:
  name: <cluster_name>-vsphere-cpi-secret
  namespace: <namespace>
type: addons.cluster.x-k8s.io/resource-set
stringData:
  data: |
    apiVersion: v1
    kind: Secret
    metadata:
      name: vsphere-cloud-secret
      namespace: kube-system
    type: Opaque
    stringData:
      <vsphere_server>.username: <vsphere_username>
      <vsphere_server>.password: <vsphere_password>
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: <cluster_name>-vsphere-cpi-manifests
  namespace: <namespace>
data:
  data: |
    apiVersion: v1
    kind: ServiceAccount
    metadata:
      name: cloud-controller-manager
      namespace: kube-system
    automountServiceAccountToken: false
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: system:cloud-controller-manager
    rules:
    - apiGroups: [""]
      resources: ["events"]
      verbs: ["create", "patch", "update"]
    - apiGroups: [""]
      resources: ["nodes"]
      verbs: ["*"]
    - apiGroups: [""]
      resources: ["nodes/status"]
      verbs: ["patch"]
    - apiGroups: [""]
      resources: ["services"]
      verbs: ["list", "patch", "update", "watch"]
    - apiGroups: [""]
      resources: ["services/status"]
      verbs: ["patch"]
    - apiGroups: [""]
      resources: ["serviceaccounts"]
      verbs: ["create", "get", "list", "watch", "update"]
    - apiGroups: [""]
      resources: ["persistentvolumes"]
      verbs: ["get", "list", "update", "watch"]
    - apiGroups: [""]
      resources: ["endpoints"]
      verbs: ["create", "get", "list", "watch", "update"]
    - apiGroups: [""]
      resources: ["secrets"]
      verbs: ["get", "list", "watch"]
    - apiGroups: ["coordination.k8s.io"]
      resources: ["leases"]
      verbs: ["get", "list", "watch", "create", "update"]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: RoleBinding
    metadata:
      name: servicecatalog.k8s.io:apiserver-authentication-reader
      namespace: kube-system
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: Role
      name: extension-apiserver-authentication-reader
    subjects:
    - apiGroup: ""
      kind: ServiceAccount
      name: cloud-controller-manager
      namespace: kube-system
    - apiGroup: ""
      kind: User
      name: cloud-controller-manager
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      name: system:cloud-controller-manager
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: system:cloud-controller-manager
    subjects:
    - kind: ServiceAccount
      name: cloud-controller-manager
      namespace: kube-system
    - kind: User
      name: cloud-controller-manager
    ---
    apiVersion: apps/v1
    kind: DaemonSet
    metadata:
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ""
      labels:
        component: cloud-controller-manager
        tier: control-plane
        k8s-app: vsphere-cloud-controller-manager
      name: vsphere-cloud-controller-manager
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          k8s-app: vsphere-cloud-controller-manager
      updateStrategy:
        type: RollingUpdate
      template:
        metadata:
          labels:
            component: cloud-controller-manager
            k8s-app: vsphere-cloud-controller-manager
        spec:
          securityContext:
            runAsUser: 1001
          automountServiceAccountToken: true
          # Optional: required when the CPI image is stored in a private
          # registry that needs authentication. The platform automatically
          # syncs a dockerconfigjson secret named "global-registry-auth"
          # into every namespace of the workload cluster when the
          # `global` cluster secret "public-registry-credential"
          # (data.content) is configured. If your environment does not
          # use a private registry, remove the imagePullSecrets block.
          imagePullSecrets:
          - name: global-registry-auth
          serviceAccountName: cloud-controller-manager
          hostNetwork: true
          tolerations:
          - operator: Exists
          - key: node.cloudprovider.kubernetes.io/uninitialized
            value: "true"
            effect: NoSchedule
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          - key: node.kubernetes.io/not-ready
            effect: NoSchedule
            operator: Exists
          containers:
          - name: vsphere-cloud-controller-manager
            image: <image_registry>/ait/cloud-provider-vsphere:<cpi_image_tag>
            args:
            - --v=2
            - --cloud-provider=vsphere
            - --cloud-config=/etc/cloud/vsphere.conf
            volumeMounts:
            - mountPath: /etc/cloud
              name: vsphere-config-volume
              readOnly: true
            resources:
              requests:
                cpu: 200m
          volumes:
          - name: vsphere-config-volume
            configMap:
              name: cloud-config
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        component: cloud-controller-manager
      name: vsphere-cloud-controller-manager
      namespace: kube-system
    spec:
      type: NodePort
      ports:
      - port: 43001
        protocol: TCP
        targetPort: 43001
      selector:
        component: cloud-controller-manager
---
apiVersion: addons.cluster.x-k8s.io/v1beta1
kind: ClusterResourceSet
metadata:
  name: <cluster_name>-vsphere-cpi
  namespace: <namespace>
spec:
  strategy: Reconcile
  clusterSelector:
    matchLabels:
      addons.cluster.x-k8s.io/vsphere-cpi: "enabled"
  resources:
  - name: <cluster_name>-vsphere-cpi-config
    kind: ConfigMap
  - name: <cluster_name>-vsphere-cpi-secret
    kind: Secret
  - name: <cluster_name>-vsphere-cpi-manifests
    kind: ConfigMap

应用该清单:

kubectl apply -f 15-vsphere-cpi-clusterresourceset.yaml

创建 machine config pool

创建控制平面 machine config pool。

INFO

每个节点槽位都在 network.primary 下声明其 NIC 布局(必填),并在 network.additional 下声明额外 NIC(可选列表)。主 NIC 的 networkName 是必填项,provider 会根据 hostname 和解析后的主 NIC 地址派生 Kubernetes 节点名称、kubelet serving certificate 的 DNS SAN,以及 kubelet 的 node-iphostname 必须是有效的 DNS-1123 子域名。

INFO

deviceName 是可选项。如果你不需要强制 guest NIC 名称,可以从每个节点槽位中移除 deviceName 行。provider 会按 NIC 顺序分配 NIC 名称,例如 eth0eth1

02-vspheremachineconfigpool-control-plane.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineConfigPool
metadata:
  name: <cluster_name>-cp-pool
  namespace: <namespace>
spec:
  clusterRef:
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: <cluster_name>
  datacenter: "<default_datacenter>"
  releaseDelayHours: <release_delay_hours>
  configs:
  - hostname: "<cp_node_name_1>"
    datacenter: "<master_01_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<master_01_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <cp_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <cp_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4
    - name: var-lib-etcd
      sizeGiB: <cp_var_lib_etcd_size_gib>
      mountPath: /var/lib/etcd
      fsFormat: ext4
      wipeFilesystem: true
  - hostname: "<cp_node_name_2>"
    datacenter: "<master_02_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<master_02_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <cp_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <cp_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4
    - name: var-lib-etcd
      sizeGiB: <cp_var_lib_etcd_size_gib>
      mountPath: /var/lib/etcd
      fsFormat: ext4
      wipeFilesystem: true
  - hostname: "<cp_node_name_3>"
    datacenter: "<master_03_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<master_03_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <cp_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <cp_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4
    - name: var-lib-etcd
      sizeGiB: <cp_var_lib_etcd_size_gib>
      mountPath: /var/lib/etcd
      fsFormat: ext4
      wipeFilesystem: true

创建 worker machine config pool。

03-vspheremachineconfigpool-worker.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineConfigPool
metadata:
  name: <cluster_name>-worker-pool
  namespace: <namespace>
spec:
  clusterRef:
    apiVersion: cluster.x-k8s.io/v1beta1
    kind: Cluster
    name: <cluster_name>
  datacenter: "<default_datacenter>"
  releaseDelayHours: <release_delay_hours>
  configs:
  - hostname: "<worker_node_name_1>"
    datacenter: "<worker_01_datacenter>"
    network:
      primary:
        networkName: "<nic1_network_name>"
        deviceName: "<nic1_device_name>"
        ip: "<worker_01_nic1_ip>/<nic1_prefix>"
        gateway: "<nic1_gateway>"
        dns:
        - "<nic1_dns_1>"
    persistentDisks:
    - name: var-cpaas
      sizeGiB: <worker_var_cpaas_size_gib>
      mountPath: /var/cpaas
      fsFormat: ext4
    - name: var-lib-containerd
      sizeGiB: <worker_var_lib_containerd_size_gib>
      mountPath: /var/lib/containerd
      fsFormat: ext4

应用这两个清单:

kubectl apply -f 02-vspheremachineconfigpool-control-plane.yaml
kubectl apply -f 03-vspheremachineconfigpool-worker.yaml

创建控制平面对象

创建 VSphereMachineTemplateKubeadmControlPlane 对象。请将下面完整模板中的占位符替换为在检查清单文档中收集到的值。

模板中保留了 cloneModediskGiB,因为 CAPV 同时接受这两个字段。实际上,diskGiB 只会在实际克隆操作为 fullClone 时影响系统盘。如果 cloneModelinkedClone 且模板存在可用快照,CAPV 会完成 linked clone,系统盘大小将保持与源模板一致。如果不存在可用快照,CAPV 会回退到 fullClone,此时 diskGiB 会再次生效。

20-control-plane.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: <cluster_name>-control-plane
  namespace: <namespace>
spec:
  template:
    spec:
      server: "<vsphere_server>"
      template: "<template_name>"
      cloneMode: <clone_mode>
      folder: "<vm_folder>"
      datastore: "<cp_system_datastore>"
      diskGiB: <cp_system_disk_gib>
      memoryMiB: <cp_memory_mib>
      numCPUs: <cp_num_cpus>
      os: Linux
      powerOffMode: <power_off_mode>
      network:
        devices:
        - networkName: "<nic1_network_name>"
      machineConfigPoolRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineConfigPool
        name: <cluster_name>-cp-pool
        namespace: <namespace>
---
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlane
metadata:
  name: <cluster_name>-kcp
  namespace: <namespace>
spec:
  rolloutStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
  version: "<k8s_version>"
  replicas: <cp_replicas>
  machineTemplate:
    nodeDrainTimeout: 1m
    nodeDeletionTimeout: 5m
    metadata:
      labels:
        node-role.kubernetes.io/control-plane: ""
    infrastructureRef:
      apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
      kind: VSphereMachineTemplate
      name: <cluster_name>-control-plane
  kubeadmConfigSpec:
    users:
    - name: boot
      sudo: ALL=(ALL) NOPASSWD:ALL
      sshAuthorizedKeys:
      - "<ssh_public_key>"
    files:
    - path: /etc/kubernetes/admission/psa-config.yaml
      owner: "root:root"
      permissions: "0644"
      content: |
        apiVersion: apiserver.config.k8s.io/v1
        kind: AdmissionConfiguration
        plugins:
        - name: PodSecurity
          configuration:
            apiVersion: pod-security.admission.config.k8s.io/v1
            kind: PodSecurityConfiguration
            defaults:
              enforce: "privileged"
              enforce-version: "latest"
              audit: "baseline"
              audit-version: "latest"
              warn: "baseline"
              warn-version: "latest"
            exemptions:
              usernames: []
              runtimeClasses: []
              namespaces:
              - kube-system
              - <namespace>
    - path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
      owner: "root:root"
      permissions: "0644"
      content: |
        {
          "apiVersion": "kubelet.config.k8s.io/v1beta1",
          "kind": "KubeletConfiguration",
          "protectKernelDefaults": true,
          "streamingConnectionIdleTimeout": "5m",
          "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
          "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key"
        }
    # Generate the encryption key with: head -c 32 /dev/urandom | base64
    - path: /etc/kubernetes/encryption-provider.conf
      owner: "root:root"
      append: false
      permissions: "0644"
      content: |
        apiVersion: apiserver.config.k8s.io/v1
        kind: EncryptionConfiguration
        resources:
        - resources:
          - secrets
          providers:
          - aescbc:
              keys:
              - name: key1
                secret: <encryption_provider_secret>
    - path: /etc/kubernetes/audit/policy.yaml
      owner: "root:root"
      append: false
      permissions: "0644"
      content: |
        apiVersion: audit.k8s.io/v1
        kind: Policy
        omitStages:
        - "RequestReceived"
        rules:
        - level: None
          users:
          - system:kube-controller-manager
          - system:kube-scheduler
          - system:serviceaccount:kube-system:endpoint-controller
          verbs: ["get", "update"]
          namespaces: ["kube-system"]
          resources:
          - group: ""
            resources: ["endpoints"]
        - level: None
          nonResourceURLs:
          - /healthz*
          - /version
          - /swagger*
        - level: None
          resources:
          - group: ""
            resources: ["events"]
        - level: None
          resources:
          - group: "devops.alauda.io"
        - level: None
          verbs: ["get", "list", "watch"]
        - level: None
          resources:
          - group: "coordination.k8s.io"
            resources: ["leases"]
        - level: None
          resources:
          - group: "authorization.k8s.io"
            resources: ["subjectaccessreviews", "selfsubjectaccessreviews"]
          - group: "authentication.k8s.io"
            resources: ["tokenreviews"]
        - level: None
          resources:
          - group: "app.alauda.io"
            resources: ["imagewhitelists"]
          - group: "k8s.io"
            resources: ["namespaceoverviews"]
        - level: Metadata
          resources:
          - group: ""
            resources: ["secrets", "configmaps"]
        - level: Metadata
          resources:
          - group: "operator.connectors.alauda.io"
            resources: ["installmanifests"]
          - group: "operators.katanomi.dev"
            resources: ["katanomis"]
        - level: RequestResponse
          resources:
          - group: ""
          - group: "aiops.alauda.io"
          - group: "apps"
          - group: "app.k8s.io"
          - group: "authentication.istio.io"
          - group: "auth.alauda.io"
          - group: "autoscaling"
          - group: "asm.alauda.io"
          - group: "clusterregistry.k8s.io"
          - group: "crd.alauda.io"
          - group: "infrastructure.alauda.io"
          - group: "monitoring.coreos.com"
          - group: "operators.coreos.com"
          - group: "networking.istio.io"
          - group: "extensions.istio.io"
          - group: "install.istio.io"
          - group: "security.istio.io"
          - group: "telemetry.istio.io"
          - group: "opentelemetry.io"
          - group: "networking.k8s.io"
          - group: "portal.alauda.io"
          - group: "rbac.authorization.k8s.io"
          - group: "storage.k8s.io"
          - group: "tke.cloud.tencent.com"
          - group: "devopsx.alauda.io"
          - group: "core.katanomi.dev"
          - group: "deliveries.katanomi.dev"
          - group: "integrations.katanomi.dev"
          - group: "artifacts.katanomi.dev"
          - group: "builds.katanomi.dev"
          - group: "versioning.katanomi.dev"
          - group: "sources.katanomi.dev"
          - group: "tekton.dev"
          - group: "operator.tekton.dev"
          - group: "eventing.knative.dev"
          - group: "flows.knative.dev"
          - group: "messaging.knative.dev"
          - group: "operator.knative.dev"
          - group: "sources.knative.dev"
          - group: "operator.devops.alauda.io"
          - group: "flagger.app"
          - group: "jaegertracing.io"
          - group: "velero.io"
            resources: ["deletebackuprequests"]
          - group: "connectors.alauda.io"
          - group: "operator.connectors.alauda.io"
            resources: ["connectorscores", "connectorsgits", "connectorsocis"]
        - level: Metadata
    - path: /usr/local/bin/capv-load-local-images.sh
      owner: "root:root"
      permissions: "0755"
      content: |
        #!/bin/bash
        set -euo pipefail
        until mountpoint -q /var/lib/containerd; do
          echo "waiting for /var/lib/containerd mount"
          sleep 1
        done
        systemctl restart containerd
        until systemctl is-active --quiet containerd; do
          echo "waiting for containerd"
          sleep 1
        done
        if [ ! -d "/root/images" ]; then
          echo "ERROR: /root/images directory not found" >&2
          exit 1
        fi
        image_count=0
        for image_file in /root/images/*.tar; do
          if [ -f "$image_file" ]; then
            echo "importing image: $image_file"
            ctr -n k8s.io images import "$image_file"
            image_count=$((image_count + 1))
          fi
        done
        if [ "$image_count" -eq 0 ]; then
          echo "ERROR: no tar files found in /root/images" >&2
          exit 1
        fi
        echo "imported $image_count images"
    preKubeadmCommands:
    - hostnamectl set-hostname "{{ ds.meta_data.hostname }}"
    - echo "::1         ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6" >/etc/hosts
    - echo "127.0.0.1   {{ ds.meta_data.hostname }} {{ local_hostname }} localhost localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
    - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
    - /usr/local/bin/capv-load-local-images.sh
    postKubeadmCommands:
    - chmod 600 /var/lib/kubelet/config.yaml
    clusterConfiguration:
      imageRepository: <image_registry>/tkestack
      dns:
        imageTag: <dns_image_tag>
      etcd:
        local:
          imageTag: <etcd_image_tag>
      apiServer:
        extraArgs:
          admission-control-config-file: /etc/kubernetes/admission/psa-config.yaml
          audit-log-format: json
          audit-log-maxage: "30"
          audit-log-maxbackup: "10"
          audit-log-maxsize: "200"
          audit-log-mode: batch
          audit-log-path: /etc/kubernetes/audit/audit.log
          audit-policy-file: /etc/kubernetes/audit/policy.yaml
          encryption-provider-config: /etc/kubernetes/encryption-provider.conf
          kubelet-certificate-authority: /etc/kubernetes/pki/ca.crt
          profiling: "false"
          tls-cipher-suites: TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384
          tls-min-version: VersionTLS12
        extraVolumes:
        - hostPath: /etc/kubernetes
          mountPath: /etc/kubernetes
          name: vol-dir-0
          pathType: Directory
      controllerManager:
        extraArgs:
          bind-address: "::"
          cloud-provider: external
          profiling: "false"
          tls-min-version: VersionTLS12
      scheduler:
        extraArgs:
          bind-address: "::"
          profiling: "false"
          tls-min-version: VersionTLS12
    initConfiguration:
      nodeRegistration:
        criSocket: /var/run/containerd/containerd.sock
        ignorePreflightErrors:
        - ImagePull
        kubeletExtraArgs:
          cloud-provider: external
          node-labels: kube-ovn/role=master
        name: '{{ local_hostname }}'
      patches:
        directory: /etc/kubernetes/patches
    joinConfiguration:
      nodeRegistration:
        criSocket: /var/run/containerd/containerd.sock
        ignorePreflightErrors:
        - ImagePull
        kubeletExtraArgs:
          cloud-provider: external
          node-labels: kube-ovn/role=master
          volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
        name: '{{ local_hostname }}'
      patches:
        directory: /etc/kubernetes/patches

应用该清单:

kubectl apply -f 20-control-plane.yaml

创建 worker 对象

创建 worker machine template、bootstrap template 和 MachineDeployment

30-workers-md-0.yaml
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: VSphereMachineTemplate
metadata:
  name: <cluster_name>-worker
  namespace: <namespace>
spec:
  template:
    spec:
      server: "<vsphere_server>"
      template: "<template_name>"
      cloneMode: <clone_mode>
      folder: "<vm_folder>"
      datastore: "<worker_system_datastore>"
      diskGiB: <worker_system_disk_gib>
      memoryMiB: <worker_memory_mib>
      numCPUs: <worker_num_cpus>
      os: Linux
      powerOffMode: <power_off_mode>
      network:
        devices:
        - networkName: "<nic1_network_name>"
      machineConfigPoolRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineConfigPool
        name: <cluster_name>-worker-pool
        namespace: <namespace>
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfigTemplate
metadata:
  name: <cluster_name>-worker-bootstrap
  namespace: <namespace>
spec:
  template:
    spec:
      files:
      - path: /etc/kubernetes/patches/kubeletconfiguration0+strategic.json
        owner: "root:root"
        permissions: "0644"
        content: |
          {
            "apiVersion": "kubelet.config.k8s.io/v1beta1",
            "kind": "KubeletConfiguration",
            "protectKernelDefaults": true,
            "staticPodPath": null,
            "streamingConnectionIdleTimeout": "5m",
            "tlsCertFile": "/etc/kubernetes/pki/kubelet.crt",
            "tlsPrivateKeyFile": "/etc/kubernetes/pki/kubelet.key"
          }
      - path: /usr/local/bin/capv-load-local-images.sh
        owner: "root:root"
        permissions: "0755"
        content: |
          #!/bin/bash
          set -euo pipefail
          until mountpoint -q /var/lib/containerd; do
            echo "waiting for /var/lib/containerd mount"
            sleep 1
          done
          systemctl restart containerd
          until systemctl is-active --quiet containerd; do
            echo "waiting for containerd"
            sleep 1
          done
          if [ ! -d "/root/images" ]; then
            echo "ERROR: /root/images directory not found" >&2
            exit 1
          fi
          image_count=0
          for image_file in /root/images/*.tar; do
            if [ -f "$image_file" ]; then
              echo "importing image: $image_file"
              ctr -n k8s.io images import "$image_file"
              image_count=$((image_count + 1))
            fi
          done
          if [ "$image_count" -eq 0 ]; then
            echo "ERROR: no tar files found in /root/images" >&2
            exit 1
          fi
          echo "imported $image_count images"
      joinConfiguration:
        nodeRegistration:
          criSocket: /var/run/containerd/containerd.sock
          ignorePreflightErrors:
          - ImagePull
          kubeletExtraArgs:
            cloud-provider: external
            volume-plugin-dir: "/opt/libexec/kubernetes/kubelet-plugins/volume/exec/"
          name: '{{ local_hostname }}'
        patches:
          directory: /etc/kubernetes/patches
      preKubeadmCommands:
      - hostnamectl set-hostname "{{ ds.meta_data.hostname }}"
      - echo "::1         ipv6-localhost ipv6-loopback localhost6 localhost6.localdomain6" >/etc/hosts
      - echo "127.0.0.1   {{ ds.meta_data.hostname }} {{ local_hostname }} localhost localhost.localdomain localhost4 localhost4.localdomain4" >>/etc/hosts
      - while ! ip route | grep -q "default via"; do sleep 1; done; echo "NetworkManager started"
      - /usr/local/bin/capv-load-local-images.sh
      postKubeadmCommands:
      - chmod 600 /var/lib/kubelet/config.yaml
      users:
      - name: boot
        sudo: ALL=(ALL) NOPASSWD:ALL
        sshAuthorizedKeys:
        - "<ssh_public_key>"
---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachineDeployment
metadata:
  name: <cluster_name>-md-0
  namespace: <namespace>
spec:
  clusterName: <cluster_name>
  replicas: <worker_replicas>
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 0
      maxUnavailable: 1
  selector:
    matchLabels:
      nodepool: md-0
  template:
    metadata:
      labels:
        cluster.x-k8s.io/cluster-name: <cluster_name>
        nodepool: md-0
    spec:
      clusterName: <cluster_name>
      version: "<k8s_version>"
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfigTemplate
          name: <cluster_name>-worker-bootstrap
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: VSphereMachineTemplate
        name: <cluster_name>-worker

应用该清单:

kubectl apply -f 30-workers-md-0.yaml

在基础工作流中,请注意以下 worker 专属规则:

  • 主 worker 清单默认不设置 failureDomain,因为基础工作流假定只有一个 datacenter。如果你需要让某个 worker MachineDeployment 落到特定的 VSphereDeploymentZone,请按照 Extension Scenarios 中的说明添加 failureDomain
  • 某些环境会向 KubeadmConfigTemplate 添加额外的 runtime-image 替换命令或服务重启命令。这些命令有意不包含在基础示例中。只有在你环境中的平台要求明确需要时,才添加它们。

等待集群就绪

所有清单应用完成后,集群创建是异步的。使用以下命令监控进度:

kubectl -n <namespace> get cluster,kubeadmcontrolplane,machinedeployment,machine -w

在继续验证之前,请等待 KubeadmControlPlane 报告预期数量的就绪副本,并且所有 Machine 对象都进入 Running 阶段。

验证

使用以下命令验证集群创建工作流。

  1. 检查 global 集群中的 CPI 交付资源:
    kubectl -n <namespace> get clusterresourceset
    kubectl -n <namespace> get clusterresourcesetbinding
  2. 导出业务 kubeconfig:
    kubectl -n <namespace> get secret <cluster_name>-kubeconfig -o jsonpath='{.data.value}' | base64 -d > /tmp/<cluster_name>.kubeconfig
  3. 检查业务集群中是否创建了 vSphere CPI daemonset:
    kubectl --kubeconfig=/tmp/<cluster_name>.kubeconfig -n kube-system get daemonset
  4. 检查 global 集群对象:
    kubectl -n <namespace> get cluster,vspherecluster,kubeadmcontrolplane,machinedeployment,machine,vspheremachine,vspherevm
  5. 检查业务节点:
    kubectl --kubeconfig=/tmp/<cluster_name>.kubeconfig get nodes -o wide

确认以下结果:

  • vsphere-cloud-controller-manager 出现在业务集群中。
  • 控制平面和 worker 节点已创建。
  • 节点最终变为 Ready

故障排查

当工作流失败时,首先使用以下命令:

kubectl -n <namespace> describe cluster <cluster_name>
kubectl -n <namespace> describe vspherecluster <cluster_name>
kubectl -n <namespace> describe kubeadmcontrolplane <cluster_name>-kcp
kubectl -n <namespace> describe machinedeployment <cluster_name>-md-0
kubectl -n <namespace> get cluster,vspherecluster,kubeadmcontrolplane,machinedeployment,machine,vspheremachine,vspherevm
kubectl -n cpaas-system logs deploy/capi-controller-manager

优先检查以下内容:

  • 如果 CPI 资源未交付,请验证 ClusterResourceSet=trueClusterResourceSetClusterResourceSetBinding
  • 如果 ClusterResourceSet 已存在但未创建任何 ClusterResourceSetBinding,请检查 controller 是否对所引用的 ConfigMapSecret 资源拥有所需的 delete 权限。
  • 如果网络插件未安装,请验证所需的集群注解是否存在,以及平台控制器是否已处理它们。
  • 如果缺少 cpaas.io/registry-address 注解,请验证公共 registry 凭证以及负责注入该注解的平台控制器。
  • 如果某个 machine 卡在 Provisioning,请检查 VSphereMachine 上的 MachineConfigPoolReady 条件——它会显示槽位分配是否因 pool 绑定或 datacenter 不匹配而失败。
  • 如果某个 VM 正在等待 IP 分配,请验证 VMware Tools、静态 IP 设置以及 VSphereVM.status.addresses
  • 如果业务 Node 对象一直没有 spec.providerID,请先验证 CPI 交付资源,然后检查是否存在重复的 vCenter guest hostname。当同一 datacenter 中的旧 VM 仍报告与新节点相同的 guest hostname 时,cloud-provider-vsphere 可能会回退到 node-name 查询、缓存旧 VM,并由于 VM IP 与 kubelet node IP 不匹配而拒绝新节点。请检查 leader vsphere-cloud-controller-manager 日志、节点 SystemUUID、真实 VM UUID 以及 vCenter guest hostname/IP 值。在你修复或移除重复 hostname 或旧 VM 冲突之后,重启业务集群的 vsphere-cloud-controller-manager Pods,以清除错误的内存缓存:
    kubectl --kubeconfig=/tmp/<cluster_name>.kubeconfig -n kube-system delete pod \
      -l k8s-app=vsphere-cloud-controller-manager
  • 如果 datastore 空间耗尽,请检查目标 datastore 中是否仍残留旧 VM 目录或 .vmdk 文件。
  • 如果模板系统盘大小与清单值不一致,请先检查实际克隆模式。当 VM 以 linkedClone 创建时,系统盘会保持模板大小,且 diskGiB 被忽略。只有 fullClone 会使用 diskGiB,并且在这种情况下 diskGiB 不能小于模板磁盘大小。
  • 如果控制平面 endpoint 未起来,请验证负载均衡器、VIP 以及端口 6443
  • 如果与 vCenter 的 TLS 连接失败,请验证 thumbprint、vCenter 地址,以及代理设置是否干扰了连接。

查看 controller 日志时,请遵循以下规则:

  • deploy/capi-controller-manager 运行在 global 集群的 cpaas-system namespace 中。
  • 不要使用业务集群的 kubeconfig 来查看 capi-controller-manager 日志。
  • 如果平台控制器处理了集群网络注解,还要检查平台 network-controller 日志和平台 cluster-lifecycle-controller 日志。

下一步

基础拓扑运行后,如果你需要第二个 NIC、多个 datacenter、failure domain、额外数据盘或更多 worker 副本,请继续阅读 Extension Scenarios