用于 LLM 安全性的 AI Guardrails

TrustyAI Guardrails Orchestrator 会在 LLM 输入和输出上运行 detector，以过滤或标记内容。它基于开源 FMS-Guardrails 项目。TrustyAI Operator 提供了 GuardrailsOrchestrator CRD，用于部署和管理它。

本文仅介绍 AutoConfig 部署以及 内置 regex detector。

前提条件使用 AutoConfig 部署资源状态内置 regex detector Gateway ConfigMap 结构 Service 端口和 API 参考端口和角色认证（已启用 auth）请求路径参考 API 使用和示例独立检测（/api/v1/text/contents）Orchestrator API：按请求选择 detector（/api/v2/chat/completions-detection）Gateway API：预设 pipeline（/all/v1/chat/completions）

前提条件

已安装 TrustyAI Operator（请参见 Install TrustyAI）。
在目标 namespace 中将 LLM 作为 InferenceService 部署。

使用 AutoConfig 部署

使用 AutoConfig 时，operator 会根据 namespace 中的资源生成 orchestrator 和 gateway 配置；对于基础设置，无需手动创建 ConfigMap。

创建一个带有 autoConfig 且启用了内置 detector 和 gateway 的 GuardrailsOrchestrator 自定义资源：

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: GuardrailsOrchestrator
metadata:
  name: guardrails-orchestrator
  namespace: <your-namespace>
  # 可选：为 route 启用认证（需要 Bearer token）
  annotations:
    security.opendatahub.io/enable-auth: "false"
spec:
  autoConfig:
    inferenceServiceToGuardrail: <inference-service-name>
  enableBuiltInDetectors: true
  enableGuardrailsGateway: true
  replicas: 1

inferenceServiceToGuardrail：要进行 guardrail 保护的 InferenceService（LLM）名称；必须与同一 namespace 中已部署的模型匹配。
enableBuiltInDetectors：当为 true 时，会添加一个内置 regex detector sidecar。
enableGuardrailsGateway：当为 true 时，gateway 会公开预设 route（例如 /all/v1/chat/completions）。

资源状态

该资源具有一个 status 子资源。status.phase 可以是 Progressing、Ready 或 Error。使用 AutoConfig 时，status.autoConfigState 保存生成的 ConfigMap 名称（generatedConfigMap、generatedGatewayConfigMap）、检测到的服务以及 message。只有在 status.phase == Ready 且对应的 Deployment 处于就绪状态后，才应发送流量。

operator 会创建一个 orchestrator ConfigMap 和一个名为 <orchestrator-name>-gateway-auto-config 的 gateway ConfigMap。内置 detector 会注册为 built-in-detector。

内置 regex detector

内置 detector 提供基于 regex 的算法。支持的算法包括：

Category	Algorithms
regex	`email`, `us-social-security-number`, `credit-card`, `ipv4`, `ipv6`, `us-phone-number`, `uk-post-code`, 或自定义 regex

默认的 gateway 配置使用占位 regex（$^）。要启用某个特定算法（例如 email），请 patch 该 ConfigMap，并将 detector_params.regex 设置为算法名称（例如 - email）。

Gateway ConfigMap 结构

ConfigMap 名称：<orchestrator-name>-gateway-auto-config。示例：

apiVersion: v1
kind: ConfigMap
metadata:
  name: guardrails-orchestrator-gateway-auto-config
  namespace: <your-namespace>
data:
  config.yaml: |
    orchestrator:
      host: "localhost"
      port: 8032
    detectors:
      - name: built-in-detector
        input: true
        output: true
        detector_params:
          regex:
            - $^                          # 更改此占位符，例如 email
    routes:
      - name: all
        detectors:
          - built-in-detector
      - name: passthrough
        detectors:

将 built-in-detector 下的 regex 更改为所需的算法（例如 - email）。更新后，请等待 Deployment 就绪。

Service 端口和 API 参考

Guardrails Orchestrator 通过名为 <orchestrator-name>-service 的 Service 暴露。端口号取决于是否启用认证（在 GuardrailsOrchestrator 上设置注解 security.opendatahub.io/enable-auth: "true"）。

端口和角色

Port name	Auth disabled	Auth enabled	Role
gateway	8090	8490	Guardrails Gateway：预设 detector pipeline 和 OpenAI 风格的 chat completion endpoint。使用它来发送 chat 请求，这些请求会在调用 LLM 前后由内置（或已配置）的 detector 进行检查。
built-in-detector	8080	8480	内置 regex detector API。独立的内容检测（不调用 LLM）。请求体：`contents`（字符串列表）和 `detector_params`（例如 `regex: ["email"]`）。
https (orchestrator)	8032	8432	Orchestrator API：直接访问 orchestrator endpoint（例如自定义检测流程、健康检查）。
health	8034	8034	健康检查 endpoint。

启用认证后，gateway 和 built-in-detector 端口需要 Bearer token。

认证（已启用 auth）

对 gateway 或 built-in-detector 端口的请求必须包含：

Authorization: Bearer <token>

该 token 必须是有效的 Kubernetes ServiceAccount token（或集群 auth proxy 接受的其他 token），并且对应主体具有访问该 service 的权限（例如 services/proxy）。未授权请求会返回 401/403。

如何获取 token

在与 Guardrails Orchestrator 相同的 namespace 中创建一个 ServiceAccount、一个 Role（对 services/proxy 具有 get、create 权限）以及一个 RoleBinding；然后为该 ServiceAccount 创建 token：

# 替换 <your-namespace>，并可选地替换 ServiceAccount 名称（例如 guardrails-client）
kubectl create serviceaccount -n <your-namespace> guardrails-client
kubectl create role -n <your-namespace> guardrails-client --verb=get,create --resource=services/proxy
kubectl create rolebinding -n <your-namespace> guardrails-client --role=guardrails-client --serviceaccount=<your-namespace>:guardrails-client
kubectl create token -n <your-namespace> guardrails-client

也可以选择设置 token 时长，例如使用 --duration=8760h 表示一年。最后一个命令会输出 token；将其作为 Authorization: Bearer <token> header 的值。

集群内的客户端可以使用投影的 ServiceAccount token volume 作为 Bearer token。

请求路径参考

Path	Port	Purpose
`POST /all/v1/chat/completions`	gateway (8090 or 8490)	通过 guardrails 的 chat completions：请求在输入检查后发送到 LLM，响应会由 detector 检查。Body：`model`、`messages`（OpenAI 风格）。Detector pipeline 由 gateway 配置固定。
`POST /api/v2/chat/completions-detection`	orchestrator (8032 or 8432)	支持按请求选择 detector 的 chat completions。Body：`model`、`messages`，以及可选的 `detectors`（例如 `{"input": {"built-in-detector": {"regex": ["email"]}}, "output": {...}}`）。当指定 detector 时，会返回模型回复以及 `detections` 和 `warnings`。当调用方需要为每个请求选择运行哪些 detector，而不是使用预设 gateway route 时，请使用此接口。
`POST /api/v1/text/contents`	built-in-detector (8080 or 8480)	独立内容检测。对给定文本运行内置 regex（或已配置的）detector；不调用 LLM。Body：`contents`、`detector_params`。
`GET /health`	orchestrator (8032 or 8432)	orchestrator 的健康检查。

其他 gateway route（例如 /<preset-name>/v1/chat/completions）在 gateway ConfigMap 的 routes 中定义。

API 使用和示例

独立检测（`/api/v1/text/contents`）

对文本运行内置 regex detector，而不调用 LLM。使用 built-in-detector 端口（8080 或 8480）。请求体：contents（字符串列表）、detector_params（例如 regex: ["email"]）。

使用 service 地址（集群内部：<orchestrator-name>-service.<your-namespace>.svc.cluster.local；集群外部：如果已暴露则使用 Ingress host）以及 built-in-detector 端口（参见 Ports and roles）。

# 如果启用了 auth，请设置 TOKEN。使用 8480 端口（auth）或 8080 端口（无 auth）。
curl -k -s -X POST "https://<service-address>:8480/api/v1/text/contents" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"contents":["hello, my email is test@example.com"],"detector_params":{"regex":["email"]}}'

响应： 一个数组（contents 中每个条目对应一个数组），数组内包含 detection object。每个对象具有 start、end、text、detection（例如 EmailAddress）、detection_type（例如 pii）和 score。

响应示例（独立检测，检测到 email）

[
  [
    {
      "detection": "EmailAddress",
      "detection_type": "pii",
      "end": 35,
      "score": 1.0,
      "start": 19,
      "text": "test@example.com"
    }
  ]
]

Orchestrator API：按请求选择 detector（`/api/v2/chat/completions-detection`）

当调用方必须为每个请求选择运行哪些 detector 时，请使用 orchestrator 端口（8032 或 8432）。请求体：model、messages，以及可选的 detectors（例如带 detector 参数的 input / output）。

示例：在输入和输出上运行 regex 为 email 的内置 detector：

# 使用 orchestrator 端口 8432（auth）或 8032（无 auth）。
curl -k -s -X POST "https://<service-address>:8432/api/v2/chat/completions-detection" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "model": "<inference-service-name>",
    "messages": [{"content": "my email is test@example.com", "role": "user"}],
    "detectors": {
      "input":  { "built-in-detector": { "regex": ["email"] } },
      "output": { "built-in-detector": { "regex": ["email"] } }
    }
  }'

当 detector 在输入中发现匹配项（例如 email）时，响应会包含 detections 和 warnings，并且 choices 为空：

响应示例（输入触发检测）

{
  "id": "0c339dbab9ab45f59e6bf052d4fd78c6",
  "object": "",
  "created": 1773118440,
  "model": "......",
  "choices": [],
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens": 0
  },
  "detections": {
    "input": [
      {
        "message_index": 0,
        "results": [
          {
            "start": 12,
            "end": 28,
            "text": "test@example.com",
            "detection": "EmailAddress",
            "detection_type": "pii",
            "detector_id": "built-in-detector",
            "score": 1.0
          }
        ]
      }
    ]
  },
  "warnings": [
    {
      "type": "UNSUITABLE_INPUT",
      "message": "检测到不适当的输入。请检查输入中的已检测实体，并移除不适当的输入后重试。"
    }
  ]
}

响应结构与 gateway chat 相同：choices、detections、warnings。如果只是普通 chat completion 且不进行检测，请省略 detectors。

Gateway API：预设 pipeline（`/all/v1/chat/completions`）

使用 gateway 端口（8090 或 8490）进行带固定 detector pipeline 的 chat（由 gateway ConfigMap 定义）。请求体：model、messages（OpenAI 风格）。如果需要按请求选择 detector，请改用 orchestrator API。

使用 service 地址和 gateway 端口（参见 Ports and roles）。

# 使用 8490 端口（auth）或 8090 端口（无 auth）。
curl -k -s -X POST "https://<service-address>:8490/all/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d '{"model":"<inference-service-name>","messages":[{"content":"my email is test@example.com","role":"user"}]}'

当输入/输出通过时： detections 和 warnings 为 null，choices 包含模型回复：

响应示例（输入/输出通过）

{
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "logprobs": null,
      "message": {
        "content": "......",
        "role": "assistant"
      }
    }
  ],
  "created": 1773109415,
  "detections": null,
  "id": "chatcmpl-6d190200e68646bba140fa584ce5c301",
  "model": "......",
  "object": "chat.completion",
  "usage": {
    "completion_tokens": 28,
    "prompt_tokens": 30,
    "total_tokens": 58
  },
  "warnings": null
}

当输入触发检测（例如 PII）时： detections 和 warnings 会被设置，choices 为空：

响应示例（输入触发检测）

{
  "choices": [],
  "created": 1773109609,
  "detections": {
    "input": [
      {
        "message_index": 0,
        "results": [
          {
            "detection": "EmailAddress",
            "detection_type": "pii",
            "detector_id": "built-in-detector",
            "end": 28,
            "score": 1.0,
            "start": 12,
            "text": "test@example.com"
          }
        ]
      }
    ],
    "output": null
  },
  "id": "24dcf55e14344b4bbc760944eb6c1630",
  "model": "......",
  "object": "",
  "usage": {
    "completion_tokens": 0,
    "prompt_tokens": 0,
    "total_tokens": 0
  },
  "warnings": [
    {
      "type": "UNSUITABLE_INPUT",
      "message": "检测到不适当的输入。请检查输入中的已检测实体，并移除不适当的输入后重试。"
    }
  ]
}

#用于 LLM 安全性的 AI Guardrails

#目录

#前提条件

#使用 AutoConfig 部署

#资源状态

#内置 regex detector

#Gateway ConfigMap 结构

#Service 端口和 API 参考

#端口和角色

#认证（已启用 auth）

#请求路径参考

#API 使用和示例

#独立检测（/api/v1/text/contents）

#Orchestrator API：按请求选择 detector（/api/v2/chat/completions-detection）

#Gateway API：预设 pipeline（/all/v1/chat/completions）