使用 Kubeflow Pipelines

Kubeflow Pipelines (KFP) 是一个基于 Docker 容器构建和部署可移植、可扩展机器学习（ML）工作流的平台。KFP SDK 允许你使用 Python 定义和操作管道及组件。

前提条件

安装 KFP SDK

在你的命名空间中启动 Jupyter Notebook（或 Workbench），并安装 KFP SDK：

python -m pip install kfp

配置 KFP 以使用你的对象存储

当你使用外部 S3/MinIO 存储服务安装 Kubeflow 时，需要添加一个 “KFP Launcher” configmap 来设置当前命名空间或用户使用的存储。你可以查看 Kubeflow 文档 https://www.kubeflow.org/docs/components/pipelines/operator-guides/configure-object-store/#s3-and-s3-compatible-provider 了解更多细节。如果没有设置配置，管道运行仍可能访问默认服务地址如 "minio-service.kubeflow:9000 "，这可能不正确。

下面是一个简单示例，供你快速开始：

apiVersion: v1
data:
  defaultPipelineRoot: s3://mlpipeline
  providers: |-
    s3:
      default:
        endpoint: minio.minio-system.svc:80
        disableSSL: true
        region: us-east-2
        forcePathStyle: true
        credentials:
          fromEnv: false
          secretRef:
            secretName: mlpipeline-minio-artifact
            accessKeyKey: accesskey
            secretKeyKey: secretkey
kind: ConfigMap
metadata:
  name: kfp-launcher
  namespace: wy-testns

例如，你应该在此 configmap 中设置以下值以指向你自己的 S3/MinIO 存储：

defaultPipelineRoot：存储管道中间数据的位置
endpoint：s3/MinIO 服务端点。注意，不应以 "http" 或 "https" 开头
disableSSL：是否禁用对端点的 "https" 访问
region：s3 区域。如果使用 MinIO，任意值均可
credentials：存储在 secret 中的 AK/SK

添加此 configmap 后，新启动的 Kubeflow Pipeline 运行将自动读取该配置，并保存 Kubeflow Pipeline 使用的相关数据。

快速开始示例

管道是机器学习工作流的描述，包括工作流中的所有组件及其以图形形式组合的方式。

下面是一个使用 KFP SDK 定义打印 “Hello, World!” 的简单管道示例。

from kfp import dsl
from kfp import compiler
from kfp.client import Client

@dsl.component
def say_hello(name: str) -> str:
    hello_text = f'Hello, {name}!'
    print(hello_text)
    return hello_text

@dsl.pipeline
def hello_pipeline(recipient: str) -> str:
    hello_task = say_hello(name=recipient)
    return hello_task.output


# 将管道编译为 YAML 文件
compiler.Compiler().compile(hello_pipeline, 'pipeline.yaml')

# 创建 KFP 客户端并提交管道运行
client = Client(host='<MY-KFP-ENDPOINT>')
run = client.create_run_from_pipeline_package(
    'pipeline.yaml',
    arguments={
        'recipient': 'World',
    },
)

有关如何定义和运行管道的更多详细信息，请参阅官方 KFP 文档：https://www.kubeflow.org/docs/components/pipelines/user-guides/

在 UI 中管理管道

你也可以直接从 Kubeflow 监控面板管理管道、实验和运行。

访问 Pipelines 监控面板

登录 Kubeflow 中央监控面板。
点击侧边栏菜单中的 Pipelines。

上传管道

如果你已经将管道编译为 YAML 文件（例如上例中的 pipeline.yaml），可以上传它：

点击 Pipelines -> Upload Pipeline。
上传文件：选择你的 pipeline.yaml。
管道名称：为其命名（例如 Hello World Pipeline）。
点击 Create。

创建运行

执行刚上传的管道：

点击管道名称打开其详情。
点击 Create Run。
运行名称：输入描述性名称。
实验：选择已有实验或创建新实验。实验用于分组相关运行。
运行参数：输入管道参数的值（例如 recipient: World）。
点击 Start。

查看运行详情

运行开始后，你将被重定向到 Run Details 页面。

Graph：可视化管道的步骤（组件）及其状态（运行中、成功、失败）。
Logs：点击图中的具体步骤，在侧边栏查看其容器日志。这对调试非常重要。
Inputs/Outputs：查看步骤间传递的工件或最终输出。
Visualizations：如果管道生成了指标或图表，它们会显示在 Run Output 或 Visualizations 标签页中。

定期运行

你可以安排管道在特定时间间隔自动运行：

在 Pipelines 列表中找到你的管道。
点击 Create Run，但选择 Recurring Run 作为运行类型（或导航至 Experiments (KFP) -> Create Recurring Run）。
触发器：设置调度（例如周期性、Cron）。
参数：配置每次调度执行时使用的输入。
点击 Start。

#使用 Kubeflow Pipelines

#目录

#前提条件

#安装 KFP SDK

#配置 KFP 以使用你的对象存储

#快速开始示例

#在 UI 中管理管道

#访问 Pipelines 监控面板

#上传管道

#创建运行

#查看运行详情

#定期运行