与 InferenceService 集成

本页展示了如何在 Alauda AI 中运行 inferenceService 时,利用 Alauda Build of Kueue 的调度和资源管理能力。

前提条件

  • 您已安装 Alauda AI。
  • 您已安装 Alauda Build of Kueue。
  • 您已安装 Alauda Build of Hami(用于演示 vGPU)。
  • Alauda Container Platform Web CLI 已与您的集群建立通信。

操作步骤

  1. 在 Alauda Container Platform 中创建项目和命名空间,例如,项目名称为 test,命名空间名称为 test-1

  2. 切换到 Alauda AI,点击 Admin > Management Namespace 中的 Namespace Manage,选择之前创建的命名空间完成管理。

  3. 通过运行以下命令创建资源:

    cat <<EOF| kubectl create -f -
    apiVersion: kueue.x-k8s.io/v1beta2
    kind: ClusterQueue
    metadata:
      name: cluster-queue
    spec:
      namespaceSelector: {}
      resourceGroups:
      - coveredResources: ["cpu", "memory", "pods", "ephemeral-storage", "nvidia.com/gpualloc", "nvidia.com/total-gpucores", "nvidia.com/total-gpumem"]
        flavors:
        - name: "default-flavor"
          resources:
          - name: "cpu"
            nominalQuota: 9
          - name: "memory"
            nominalQuota: 36Gi
          - name: "pods"
            nominalQuota: 5
          - name: "nvidia.com/gpualloc"
            nominalQuota: "2"
          - name: "nvidia.com/total-gpucores"
            nominalQuota: "50"
          - name: "nvidia.com/total-gpumem"
            nominalQuota: "20000"
          - name: "ephemeral-storage"
            nominalQuota: 100Gi
    ---
    apiVersion: kueue.x-k8s.io/v1beta2
    kind: ResourceFlavor
    metadata:
      name: default-flavor
    ---
    apiVersion: kueue.x-k8s.io/v1beta2
    kind: LocalQueue
    metadata:
      namespace: test-1
      name: test
    spec:
      clusterQueue: cluster-queue
    EOF
  4. 在 Alauda AI UI 中创建带有标签 kueue.x-k8s.io/queue-name: testInferenceService 资源:

    kind: InferenceService
    apiVersion: serving.kserve.io/v1beta1
    metadata:
      labels:
        kueue.x-k8s.io/queue-name: test
      name: test
      namespace: test-1
    # ...
    spec:
        model:
          resources:
            limits:
              cpu: '1'
              ephemeral-storage: 10Gi
              memory: 2Gi
              nvidia.com/gpualloc: '1'
              nvidia.com/gpucores: '80'
              nvidia.com/gpumem: 8k
    # ...
  5. 观察 InferenceService 的 pods:

    kubectl -n test-1 get pod | grep test-predictor

    您会看到该 pod 处于 SchedulingGated 状态:

    test-predictor-8475554f4d-zw7lp   0/1     SchedulingGated   0          13s   <none>   <none>   <none>           <none>
    
  6. 更新 nvidia.com/total-gpucores 配额:

    cat <<EOF| kubectl apply -f -
    apiVersion: kueue.x-k8s.io/v1beta2
    kind: ClusterQueue
    metadata:
      name: cluster-queue
    spec:
      namespaceSelector: {}
      resourceGroups:
      - coveredResources: ["cpu", "memory", "pods", "ephemeral-storage", "nvidia.com/gpualloc", "nvidia.com/total-gpucores", "nvidia.com/total-gpumem"]
        flavors:
        - name: "default-flavor"
          resources:
          - name: "cpu"
            nominalQuota: 9
          - name: "memory"
            nominalQuota: 36Gi
          - name: "pods"
            nominalQuota: 5
          - name: "nvidia.com/gpualloc"
            nominalQuota: "2"
          - name: "nvidia.com/total-gpucores"
            nominalQuota: "100"
          - name: "nvidia.com/total-gpumem"
            nominalQuota: "20000"
          - name: "ephemeral-storage"
            nominalQuota: 100Gi
    EOF

    您会看到该 pod 处于 Running 状态:

    test-predictor-8475554f4d-zw7lp   1/1     Running   0          13s   <none>   <none>   <none>           <none>