为 Pod 安全应用 SecurityContextConstraints

本指南面向平台管理员和安全管理员。它将指导你如何在现有 Kyverno 部署之上安装一个 SecurityContextConstraints(SCC)引擎,以及如何将 SCC profile 绑定到 ServiceAccount、User 和 Group,从而在准入时自动强制执行 Pod 安全边界。

简介

OpenShift 的 SecurityContextConstraints(SCC)模型允许集群管理员定义一组 Pod 安全 profile 库,然后授予相关主体(ServiceAccount、User、Group)使用特定 profile 的权限。当 Pod 被准入时,平台会选择该主体被允许使用的最合适的 SCC,填充缺失的默认值,并根据该 profile 验证 Pod。工作负载本身无需声明每一个安全字段——SCC profile 会替它完成。

原生 Kubernetes 没有等效的内置能力。本指南会安装一个基于 Kyverno 的引擎,使任何已经运行 Kyverno 的标准 Kubernetes 集群都能获得类似 SCC 的体验。它使用以下机制:

  • 一个 SecurityContextConstraints CRD(security.alauda.io/v1alpha1)用于存储 SCC profile。
  • 标准 Kubernetes RBAC(use 动词加 resourceNames)将主体绑定到 profile,因此 operator 工作流与 OpenShift 保持一致(oc adm policy add-scc-to-user 模式可一一对应迁移)。
  • 一对 Kyverno 准入策略——一个 mutating,一个 validating——用于选择正确的 SCC、填充默认值,并拒绝任何已授予的 SCC 都无法接受的 Pod。
  • 五个 GlobalContextEntry 资源,用于在内存中缓存 SCC profile 和相关 RBAC 对象,从而使准入决策无需额外 API 调用。

结果是:应用团队继续编写简单直接的 Pod 清单,集群会自动将其约束到某个 ServiceAccount 被允许使用的安全 profile;从 OpenShift 迁移时,无需修改绑定模型。

SCC 授权属于安全控制变更。应用团队不应被授予直接创建或修改 SCC RBAC 绑定的权限,因为这样会让他们绕过集群安全边界。应用团队应描述工作负载需求,例如 anyuidhostNetworkhostPath;平台或安全管理员审核请求后,将最小权限的 SCC 绑定到相应主体。

各角色职责

使用下表来判断本指南中哪些部分适用于你。

角色你要做什么你不应做什么
平台管理员或安全管理员安装 SCC 引擎、批准 SCC 请求、创建 SCC RBAC 绑定、将 validating 策略从 Warn 切换到 Deny,以及审计例外。不要在没有工作负载级理由和责任人的情况下授予过宽的 SCC,例如 privilegedhostaccessanyuid
应用管理者或应用所有者确定工作负载需要什么,例如 root UID、host networking、host ports、host paths、user namespaces,或固定 UID 范围。向平台或安全管理员提供 namespace、ServiceAccount、工作负载名称和原因。经批准后,使用分配的 ServiceAccount 部署工作负载。不要创建 SCC RBAC 绑定,也不要向自己的 ServiceAccount 授予 SCC 权限。除非管理员要求将某个特定 SCC 固定,否则不要使用 alauda.io/required-scc

如果你是平台管理员或安全管理员,请执行第 1 部分和第 2 部分。如果你是应用管理者,请先使用步骤 2.1 准备 SCC 请求,然后仅在管理员批准并绑定 SCC 后,使用步骤 2.5 和步骤 2.6。不要自行应用步骤 2.2 到步骤 2.4 中的 RBAC 清单。

正常工作流程如下:

  1. 应用管理者识别工作负载需求和目标 ServiceAccount。
  2. 平台或安全管理员选择最小权限的 SCC 并创建 RBAC 绑定。
  3. 应用管理者使用已批准的 ServiceAccount 部署工作负载,并且仅在管理员要求将某个特定 SCC 固定时,才添加 alauda.io/required-scc
  4. 管理员使用 kubectl auth can-i 验证授权,工作负载所有者则验证被准入的 Pod 具有预期的 alauda.io/scc 注解。

使用场景

在以下任一情况下应用本指南:

  • 你正在将工作负载从 OpenShift 迁移,希望保留现有的 oc adm policy add-scc-to-* 绑定模型,以便平台团队和审计工具继续按原样工作。
  • 你已经在使用 Kyverno,并且需要一个集中管理的安全边界,而不希望每个 Pod 清单都声明完整的 securityContext
  • 你运行的是多租户集群,并希望不同 namespace 中的不同 ServiceAccount 获得不同的安全上限——例如,应用 SA 仅限于 restricted-v2,日志收集 SA 允许 hostmount-anyuid,Ingress controller SA 允许 NET_BIND_SERVICE
  • 你希望有一个集群级的统一位置来表达和审计“谁可以运行 privileged Pod”,而不是把例外分散到每个 namespace 中。

前提条件

开始之前,请确认以下所有条件都已满足:

  1. Kubernetes 集群运行的版本为 1.30 或更高(CEL admission 已稳定)。

  2. Kyverno 已安装并运行,版本为 v4.3.1 或更高,并且可用 MutatingPolicyValidatingPolicyGlobalContextEntry CRD。你可以通过以下命令验证:

    kubectl get crd validatingpolicies.policies.kyverno.io mutatingpolicies.policies.kyverno.io globalcontextentries.kyverno.io
  3. kyverno namespace 中包含以下 ServiceAccount(默认 Kyverno 安装):

    • kyverno-admission-controller
    • kyverno-background-controller
    • kyverno-reports-controller
  4. 你拥有 cluster-admin(或等效)权限,因为安装该引擎需要创建 CRD、ClusterRole、ClusterRoleBinding、GlobalContextEntry 和 admission policy。

  5. 你已经检查了打算允许非 restricted Pod 的每个 namespace 上的 Pod Security Admission(PSA)enforce 标签。PSA 的执行顺序在 Kyverno 之前;如果某个 namespace 标记为 pod-security.kubernetes.io/enforce: restricted,那么该 namespace 会在 Kyverno 介入之前,拒绝任何匹配宽松 SCC 的 Pod,例如 anyuidhostnetwork-v2。请根据需要将 namespace 标签调整为 baselineprivileged,或者在这些 namespace 中限制你提供的 SCC profile 集合。

Tip

引擎安装是一次性工作,通常由平台管理员完成。第 2 部分同样是管理员工作流:平台或安全管理员在审查工作负载需求后绑定 SCC profile。应用团队通常只需提供这些需求,然后使用分配的 ServiceAccount。

步骤

工作分为两部分:

  • 第 1 部分 在集群范围内安装 SCC 引擎。每个集群只需执行一次。
  • 第 2 部分 通过将 SCC profile 绑定到 ServiceAccount、User 和 Group 来授权工作负载使用 SCC,并在需要时将特定工作负载固定到特定 SCC。

第 1 部分:安装 SCC 引擎

步骤 1.1 — 安装 SecurityContextConstraints CRD

将以下清单保存为 scc-crd.yaml。它定义了一个 cluster-scoped 的 SecurityContextConstraints 资源(短名称 scc),其字段与 OpenShift SCC 语义一致。

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: securitycontextconstraints.security.alauda.io
spec:
  group: security.alauda.io
  names:
    plural: securitycontextconstraints
    singular: securitycontextconstraints
    kind: SecurityContextConstraints
    listKind: SecurityContextConstraintsList
    shortNames:
      - scc
  scope: Cluster
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          description: |
            SecurityContextConstraints governs the ability to make requests that affect
            container security context. This custom CRD mirrors OpenShift SCC semantics
            while keeping fields under spec for Kyverno CEL consumption.
          type: object
          required:
            - spec
          properties:
            apiVersion:
              type: string
            kind:
              type: string
            metadata:
              type: object
            spec:
              type: object
              required:
                - runAsUser
              properties:
                allowHostPorts:
                  description: Determines if the profile allows host ports in containers.
                  type: boolean
                priority:
                  description: Higher priority SCC is evaluated first.
                  type: integer
                  format: int32
                  nullable: true
                restrictiveScore:
                  description: Secondary sort key. Lower score means less restrictive.
                  type: integer
                  format: int32
                  minimum: 0
                requiredDropCapabilities:
                  description: Capabilities that must be dropped.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
                allowPrivilegedContainer:
                  description: Determines if privileged containers are allowed.
                  type: boolean
                runAsUser:
                  description: Strategy controlling runAsUser.
                  type: object
                  nullable: true
                  properties:
                    type:
                      description: Strategy type for runAsUser.
                      type: string
                      enum:
                        - RunAsAny
                        - MustRunAs
                        - MustRunAsRange
                        - MustRunAsNonRoot
                        - MustRunAsNonRootOrSystem
                    uid:
                      description: Required when type=MustRunAs.
                      type: integer
                      format: int64
                      minimum: 0
                    uidRangeMin:
                      description: Minimum uid for MustRunAsRange.
                      type: integer
                      format: int64
                      minimum: 0
                    uidRangeMax:
                      description: Maximum uid for MustRunAsRange.
                      type: integer
                      format: int64
                      minimum: 0
                users:
                  description: Users who can use this SCC.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
                groups:
                  description: Groups who can use this SCC.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
                allowHostDirVolumePlugin:
                  description: Determines if hostPath-like volume plugin usage is allowed.
                  type: boolean
                seccompProfiles:
                  description: Allowed seccomp profiles. '*' allows all.
                  type: array
                  nullable: true
                  items:
                    type: string
                    pattern: "^(\\*|runtime/default|unconfined|localhost/.+)$"
                  x-kubernetes-list-type: atomic
                allowHostIPC:
                  description: Determines if host IPC is allowed.
                  type: boolean
                forbiddenSysctls:
                  description: Explicitly forbidden sysctls.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
                seLinuxContext:
                  description: Strategy controlling SELinux labels.
                  type: object
                  nullable: true
                  properties:
                    type:
                      description: Strategy type for SELinux context.
                      type: string
                    seLinuxOptions:
                      description: Fixed SELinux options required by MustRunAs.
                      type: object
                      properties:
                        user:
                          type: string
                        role:
                          type: string
                        type:
                          type: string
                        level:
                          type: string
                readOnlyRootFilesystem:
                  description: Forces readOnlyRootFilesystem when set to true.
                  type: boolean
                fsGroup:
                  description: Strategy controlling fsGroup.
                  type: object
                  nullable: true
                  properties:
                    type:
                      type: string
                    ranges:
                      type: array
                      items:
                        type: object
                        properties:
                          min:
                            type: integer
                            format: int64
                          max:
                            type: integer
                            format: int64
                      x-kubernetes-list-type: atomic
                supplementalGroups:
                  description: Strategy controlling supplemental groups.
                  type: object
                  nullable: true
                  properties:
                    type:
                      type: string
                    ranges:
                      type: array
                      items:
                        type: object
                        properties:
                          min:
                            type: integer
                            format: int64
                          max:
                            type: integer
                            format: int64
                      x-kubernetes-list-type: atomic
                userNamespaceLevel:
                  description: Controls host user namespace usage.
                  type: string
                  default: AllowHostLevel
                  enum:
                    - AllowHostLevel
                    - RequirePodLevel
                defaultAddCapabilities:
                  description: Capabilities added by default unless explicitly dropped.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
                allowedUnsafeSysctls:
                  description: Explicitly allowed unsafe sysctls.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
                allowedFlexVolumes:
                  description: Allowed flex volume drivers.
                  type: array
                  nullable: true
                  items:
                    type: object
                    required:
                      - driver
                    properties:
                      driver:
                        type: string
                  x-kubernetes-list-type: atomic
                volumes:
                  description: Allowed volume plugin types. '*' allows all.
                  type: array
                  nullable: true
                  items:
                    type: string
                    enum:
                      - '*'
                      - none
                      - hostPath
                      - emptyDir
                      - gcePersistentDisk
                      - awsElasticBlockStore
                      - gitRepo
                      - secret
                      - nfs
                      - iscsi
                      - glusterfs
                      - persistentVolumeClaim
                      - rbd
                      - flexVolume
                      - cinder
                      - cephfs
                      - flocker
                      - downwardAPI
                      - fc
                      - azureFile
                      - configMap
                      - vsphereVolume
                      - quobyte
                      - azureDisk
                      - photonPersistentDisk
                      - projected
                      - portworxVolume
                      - scaleIO
                      - storageos
                      - csi
                      - ephemeral
                      - image
                  x-kubernetes-list-type: atomic
                allowHostPID:
                  description: Determines if host PID is allowed.
                  type: boolean
                allowHostNetwork:
                  description: Determines if hostNetwork is allowed.
                  type: boolean
                allowPrivilegeEscalation:
                  description: Determines if privilege escalation can be requested.
                  type: boolean
                  nullable: true
                defaultAllowPrivilegeEscalation:
                  description: Default for allowPrivilegeEscalation when container omits it.
                  type: boolean
                  nullable: true
                allowedCapabilities:
                  description: Capabilities that may be added.
                  type: array
                  nullable: true
                  items:
                    type: string
                  x-kubernetes-list-type: atomic
              x-kubernetes-validations:
                - rule: "!has(self.runAsUser) || self.runAsUser.type != 'MustRunAs' || has(self.runAsUser.uid)"
                  message: "runAsUser.uid is required when runAsUser.type is MustRunAs."
                - rule: "!has(self.runAsUser) || self.runAsUser.type == 'MustRunAs' || !has(self.runAsUser.uid)"
                  message: "runAsUser.uid is only allowed when runAsUser.type is MustRunAs."
                - rule: "!has(self.runAsUser) || self.runAsUser.type != 'MustRunAsRange' || (has(self.runAsUser.uidRangeMin) && has(self.runAsUser.uidRangeMax))"
                  message: "uidRangeMin and uidRangeMax are required when runAsUser.type is MustRunAsRange."
                - rule: "!has(self.runAsUser) || self.runAsUser.type == 'MustRunAsRange' || (!has(self.runAsUser.uidRangeMin) && !has(self.runAsUser.uidRangeMax))"
                  message: "uidRangeMin and uidRangeMax are only allowed when runAsUser.type is MustRunAsRange."
                - rule: "!has(self.runAsUser) || !has(self.runAsUser.uidRangeMin) || !has(self.runAsUser.uidRangeMax) || self.runAsUser.uidRangeMin <= self.runAsUser.uidRangeMax"
                  message: "uidRangeMin must be less than or equal to uidRangeMax."
      additionalPrinterColumns:
        - name: Priv
          type: string
          description: Determines if privileged containers are allowed
          jsonPath: .spec.allowPrivilegedContainer
        - name: Caps
          type: string
          description: Allowed capabilities
          jsonPath: .spec.allowedCapabilities
        - name: SELinux
          type: string
          description: SELinux strategy
          jsonPath: .spec.seLinuxContext.type
        - name: RunAsUser
          type: string
          description: RunAsUser strategy
          jsonPath: .spec.runAsUser.type
        - name: FSGroup
          type: string
          description: FSGroup strategy
          jsonPath: .spec.fsGroup.type
        - name: SupGroup
          type: string
          description: SupplementalGroups strategy
          jsonPath: .spec.supplementalGroups.type
        - name: Priority
          type: string
          description: SCC sort priority
          jsonPath: .spec.priority
        - name: Score
          type: string
          description: Secondary restrictive score
          jsonPath: .spec.restrictiveScore
        - name: ReadOnlyRootFS
          type: string
          description: Force read-only root filesystem
          jsonPath: .spec.readOnlyRootFilesystem
        - name: Volumes
          type: string
          description: Allowed volume plugins
          jsonPath: .spec.volumes
  conversion:
    strategy: None

应用它,并在继续之前等待 CRD 变为 Established

kubectl apply -f scc-crd.yaml
kubectl wait --for=condition=Established --timeout=120s \
  crd/securitycontextconstraints.security.alauda.io

步骤 1.2 — 安装 13 个内置 SCC profile

将以下清单保存为 scc-profiles.yaml。它定义了 13 个基于 OpenShift 内置集合建模的 SCC profile,按限制性从高到低排列(restrictiveScore: 100restrictiveScore: 0)。当同一主体被授予多个 SCC 时,自动选择策略优先选择更高的 restrictiveScore

Tip

你不必安装所有 profile。可以将此清单裁剪为平台实际提供的子集——但必须确保每个主体至少有一个可用 profile,否则其 Pod 会在准入阶段被拒绝。

apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: restricted-v2
spec:
  priority: 0
  restrictiveScore: 100
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: false
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 1
    uidRangeMax: 2147483647
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: MustRunAs
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities:
    - NET_BIND_SERVICE
  requiredDropCapabilities:
    - ALL
  defaultAddCapabilities: []
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  seccompProfiles:
    - runtime/default
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: restricted-v3
spec:
  priority: 0
  restrictiveScore: 100
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: false
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 1000
    uidRangeMax: 65534
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: MustRunAs
    ranges:
      - min: 1000
        max: 65534
  supplementalGroups:
    type: MustRunAs
    ranges:
      - min: 1000
        max: 65534
  userNamespaceLevel: RequirePodLevel
  allowedCapabilities:
    - NET_BIND_SERVICE
  requiredDropCapabilities:
    - ALL
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  seccompProfiles:
    - runtime/default
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: restricted
spec:
  priority: 0
  restrictiveScore: 98
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 1
    uidRangeMax: 2147483647
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: MustRunAs
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities: []
  requiredDropCapabilities:
    - KILL
    - MKNOD
    - SETUID
    - SETGID
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: nonroot-v2
spec:
  priority: 0
  restrictiveScore: 95
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: false
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsNonRoot
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: RunAsAny
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities:
    - NET_BIND_SERVICE
  requiredDropCapabilities:
    - ALL
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  seccompProfiles:
    - runtime/default
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: nonroot
spec:
  priority: 0
  restrictiveScore: 92
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsNonRoot
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: RunAsAny
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities: []
  requiredDropCapabilities:
    - KILL
    - MKNOD
    - SETUID
    - SETGID
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: hostnetwork-v2
spec:
  priority: 0
  restrictiveScore: 70
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: false
  allowHostNetwork: true
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: true
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 1
    uidRangeMax: 2147483647
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: MustRunAs
  supplementalGroups:
    type: MustRunAs
  allowedCapabilities:
    - NET_BIND_SERVICE
  requiredDropCapabilities:
    - ALL
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  seccompProfiles:
    - runtime/default
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: hostnetwork
spec:
  priority: 0
  restrictiveScore: 68
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: true
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: true
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 1
    uidRangeMax: 2147483647
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: MustRunAs
  supplementalGroups:
    type: MustRunAs
  allowedCapabilities: []
  requiredDropCapabilities:
    - KILL
    - MKNOD
    - SETUID
    - SETGID
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: anyuid
spec:
  priority: 10
  restrictiveScore: 60
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: RunAsAny
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: RunAsAny
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities: []
  requiredDropCapabilities:
    - MKNOD
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: nested-container
spec:
  priority: 0
  restrictiveScore: 58
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: false
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 0
    uidRangeMax: 65534
  seLinuxContext:
    type: MustRunAs
    seLinuxOptions:
      type: container_engine_t
  fsGroup:
    type: MustRunAs
    ranges:
      - min: 0
        max: 65534
  supplementalGroups:
    type: MustRunAs
    ranges:
      - min: 0
        max: 65534
  userNamespaceLevel: RequirePodLevel
  allowedCapabilities:
    - SETUID
    - SETGID
  requiredDropCapabilities: []
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - image
    - persistentVolumeClaim
    - projected
    - secret
  seccompProfiles:
    - '*'
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: hostmount-anyuid
spec:
  priority: 0
  restrictiveScore: 55
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: true
  runAsUser:
    type: RunAsAny
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: RunAsAny
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities: []
  requiredDropCapabilities:
    - MKNOD
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - hostPath
    - image
    - nfs
    - persistentVolumeClaim
    - projected
    - secret
  readOnlyRootFilesystem: false
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: hostmount-anyuid-v2
spec:
  priority: 0
  restrictiveScore: 50
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: false
  allowHostPID: false
  allowHostIPC: false
  allowHostPorts: false
  allowHostDirVolumePlugin: true
  runAsUser:
    type: RunAsAny
  seLinuxContext:
    type: RunAsAny
  fsGroup:
    type: RunAsAny
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities: []
  requiredDropCapabilities:
    - MKNOD
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - hostPath
    - image
    - nfs
    - persistentVolumeClaim
    - projected
    - secret
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: hostaccess
spec:
  priority: 0
  restrictiveScore: 40
  allowPrivilegedContainer: false
  allowPrivilegeEscalation: true
  allowHostNetwork: true
  allowHostPID: true
  allowHostIPC: true
  allowHostPorts: true
  allowHostDirVolumePlugin: true
  runAsUser:
    type: MustRunAsRange
    uidRangeMin: 1
    uidRangeMax: 2147483647
  seLinuxContext:
    type: MustRunAs
  fsGroup:
    type: MustRunAs
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities: []
  requiredDropCapabilities:
    - KILL
    - MKNOD
    - SETUID
    - SETGID
  volumes:
    - configMap
    - csi
    - downwardAPI
    - emptyDir
    - ephemeral
    - hostPath
    - image
    - persistentVolumeClaim
    - projected
    - secret
---
apiVersion: security.alauda.io/v1alpha1
kind: SecurityContextConstraints
metadata:
  name: privileged
spec:
  priority: 0
  restrictiveScore: 0
  allowPrivilegedContainer: true
  allowPrivilegeEscalation: true
  allowHostNetwork: true
  allowHostPID: true
  allowHostIPC: true
  allowHostPorts: true
  allowHostDirVolumePlugin: true
  runAsUser:
    type: RunAsAny
  seLinuxContext:
    type: RunAsAny
  fsGroup:
    type: RunAsAny
  supplementalGroups:
    type: RunAsAny
  allowedCapabilities:
    - '*'
  requiredDropCapabilities: []
  volumes:
    - '*'
  seccompProfiles:
    - '*'
  allowedUnsafeSysctls:
    - '*'

应用这些 profile:

kubectl apply -f scc-profiles.yaml
kubectl get scc

你应当看到全部 13 个 profile 已列出,并且 PriorityScore 列已填充(以及 PrivRunAsUserVolumes 等其他 SCC 列)。

步骤 1.3 — 安装 GlobalContextEntry、Kyverno 读取 RBAC 和 admission policy

此步骤一次性安装三项内容:

  1. GlobalContextEntry(GCE)——五个内存缓存,Kyverno 在准入时使用它们查找 SCC profile、ClusterRole、ClusterRoleBinding、RoleBinding 和 Role,而无需对每个请求进行 API 调用。
  2. 读取 RBAC——一个 ClusterRole,授予 Kyverno 的三个 ServiceAccount 对 SCC CRD、上面四类 RBAC 资源,以及策略匹配到的 Pod / pods/ephemeralcontainers 资源的读取权限。
  3. 两个 admission policy——一个 MutatingPolicy 用于填充所选 SCC 的默认值,一个 ValidatingPolicy 用于拒绝任何已授予 SCC 都不接受的 Pod。
Warning

这两个策略包含驱动 SCC 选择和验证的 CEL 逻辑。你在使用该引擎时不需要阅读或理解这些 CEL 表达式——直接按原样应用清单即可。之所以表达式较长,是因为它们逐字段复现了 OpenShift SCC 的准入算法。

将以下内容保存为 scc-gce.yaml 并应用:

apiVersion: kyverno.io/v2alpha1
kind: GlobalContextEntry
metadata:
  name: scc-profiles
spec:
  kubernetesResource:
    group: security.alauda.io
    version: v1alpha1
    resource: securitycontextconstraints
  projections:
    - name: items
      jmesPath: "@"
---
apiVersion: kyverno.io/v2alpha1
kind: GlobalContextEntry
metadata:
  name: scc-clusterroles
spec:
  kubernetesResource:
    group: rbac.authorization.k8s.io
    version: v1
    resource: clusterroles
  projections:
    - name: items
      jmesPath: "@"
---
apiVersion: kyverno.io/v2alpha1
kind: GlobalContextEntry
metadata:
  name: scc-clusterrolebindings
spec:
  kubernetesResource:
    group: rbac.authorization.k8s.io
    version: v1
    resource: clusterrolebindings
  projections:
    - name: items
      jmesPath: "@"
---
apiVersion: kyverno.io/v2alpha1
kind: GlobalContextEntry
metadata:
  name: scc-rolebindings
spec:
  kubernetesResource:
    group: rbac.authorization.k8s.io
    version: v1
    resource: rolebindings
  projections:
    - name: items
      jmesPath: "@"
---
apiVersion: kyverno.io/v2alpha1
kind: GlobalContextEntry
metadata:
  name: scc-roles
spec:
  kubernetesResource:
    group: rbac.authorization.k8s.io
    version: v1
    resource: roles
  projections:
    - name: items
      jmesPath: "@"

将以下内容保存为 scc-reader-rbac.yaml 并应用。由于 Kyverno 会在策略就绪门控(RBACPermissionsGranted)期间检查每个匹配资源的读取权限,因此需要 podspods/ephemeralcontainers 的读取权限;否则 mutating policy 会保持 NotReady

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kyverno-scc-reader
rules:
  - apiGroups:
      - security.alauda.io
    resources:
      - securitycontextconstraints
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - clusterroles
      - clusterrolebindings
      - rolebindings
      - roles
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - ""
    resources:
      - pods
      - pods/ephemeralcontainers
    verbs:
      - get
      - list
      - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kyverno-scc-reader
subjects:
  - kind: ServiceAccount
    name: kyverno-admission-controller
    namespace: kyverno
  - kind: ServiceAccount
    name: kyverno-background-controller
    namespace: kyverno
  - kind: ServiceAccount
    name: kyverno-reports-controller
    namespace: kyverno
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kyverno-scc-reader

将以下内容保存为 scc-auto-pick.yaml。这是一个 ValidatingPolicy,用于拒绝任何已授予 SCC 都不接受的 Pod。

Warning

下面的示例配置为 validationActions: [Deny]。在已有集群上,首次应用前请先将其改为 validationActions: [Warn],在你审查告警并创建所需的 SCC 绑定后,再切回 Deny。滚动过程请参见步骤 1.4。

apiVersion: policies.kyverno.io/v1alpha1
kind: ValidatingPolicy
metadata:
  name: scc-auto-pick
  annotations:
    policies.kyverno.io/title: SCC Auto-Pick (CEL, CRD + RBAC)
    pod-policies.kyverno.io/autogen-controllers: "none"
spec:
  autogen:
    podControllers:
      controllers: []
    validatingAdmissionPolicy:
      enabled: false
  evaluation:
    admission:
      enabled: true
    background:
      enabled: false
  failurePolicy: Fail
  validationActions:
    - Deny
  matchConstraints:
    resourceRules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["CREATE", "UPDATE"]
        resources: ["pods"]
  matchConditions:
    - name: skip-system-ns
      expression: |
        !(request.namespace.startsWith('kube-') ||
          request.namespace.startsWith('cpaas-') ||
          request.namespace.startsWith('alauda-') ||
          request.namespace == 'kyverno' ||
          request.namespace == 'cattle-system' ||
          request.namespace == 'operators' ||
          request.namespace == 'default')

  variables:
    - name: containers
      expression: |
        object.spec.containers + object.spec.?initContainers.orValue([]) +
        object.spec.?ephemeralContainers.orValue([])

    - name: required
      expression: object.metadata.?annotations[?'alauda.io/required-scc'].orValue('')

    - name: profiles
      expression: |
        cel.bind(items, globalContext.Get('scc-profiles', 'items'),
          items == null ? [] : items)

    - name: subjectMatches
      expression: |
        [
          {'kind':'ServiceAccount',
           'name': string(object.spec.?serviceAccountName.orValue('default')),
           'namespace': string(request.namespace)},
          {'kind':'Group', 'name':'system:serviceaccounts'},
          {'kind':'Group', 'name':'system:serviceaccounts:'+request.namespace},
          {'kind':'Group', 'name':'system:authenticated'},
          {'kind':'User',  'name': request.userInfo.username}
        ]
        + request.userInfo.groups.map(g, {'kind':'Group','name': g})

    - name: rolebindings
      expression: |
        cel.bind(rbs, globalContext.Get('scc-rolebindings','items'),
          rbs == null ? [] : rbs)

    - name: matchedClusterRoleRefsFromCRB
      expression: |
        cel.bind(crbs, globalContext.Get('scc-clusterrolebindings','items'),
          crbs == null ? [] : crbs)
        .filter(b, b.?roleRef.?kind.orValue('') == 'ClusterRole'
                && b.?subjects.orValue([]).exists(s,
            variables.subjectMatches.exists(m,
              s.kind == m.kind && s.name == m.name &&
              (s.kind != 'ServiceAccount' ||
               s.?namespace.orValue('') == m.?namespace.orValue('')))))
        .map(b, b.roleRef.name)

    - name: matchedClusterRoleRefsFromRB
      expression: |
        variables.rolebindings
          .filter(b, b.?metadata.?namespace.orValue('') == request.namespace
                  && b.?roleRef.?kind.orValue('') == 'ClusterRole'
                  && b.?subjects.orValue([]).exists(s,
              variables.subjectMatches.exists(m,
                s.kind == m.kind && s.name == m.name &&
                (s.kind != 'ServiceAccount' ||
                 s.?namespace.orValue('') == m.?namespace.orValue('')))))
          .map(b, b.roleRef.name)

    - name: matchedRoleRefsFromRB
      expression: |
        variables.rolebindings
          .filter(b, b.?metadata.?namespace.orValue('') == request.namespace
                  && b.?roleRef.?kind.orValue('') == 'Role'
                  && b.?subjects.orValue([]).exists(s,
              variables.subjectMatches.exists(m,
                s.kind == m.kind && s.name == m.name &&
                (s.kind != 'ServiceAccount' ||
                 s.?namespace.orValue('') == m.?namespace.orValue('')))))
          .map(b, b.roleRef.name)

    - name: matchedClusterRoleRefs
      expression: |
        variables.matchedClusterRoleRefsFromCRB + variables.matchedClusterRoleRefsFromRB

    - name: allSccNames
      expression: |
        variables.profiles.map(p, p.metadata.name)

    - name: assignedFromClusterRoles
      expression: |
        cel.bind(crs, globalContext.Get('scc-clusterroles','items'),
          crs == null ? [] : crs)
          .filter(r, variables.matchedClusterRoleRefs.exists(n, n == r.metadata.name))
          .map(r, r.?rules.orValue([])
            .filter(ru,
              ru.?apiGroups.orValue([]).exists(g, g == 'security.alauda.io' || g == '*') &&
              ru.?resources.orValue([]).exists(x, x == 'securitycontextconstraints' || x == '*') &&
              ru.?verbs.orValue([]).exists(v, v == 'use' || v == '*'))
            .map(ru,
              ru.?resourceNames.orValue([]).size() == 0
                ? variables.allSccNames
                : ru.resourceNames)
          )
          .flatten()
          .flatten()

    - name: assignedFromRoles
      expression: |
        cel.bind(roles, globalContext.Get('scc-roles','items'),
          roles == null ? [] : roles)
          .filter(r,
            r.?metadata.?namespace.orValue('') == request.namespace
            && variables.matchedRoleRefsFromRB.exists(n, n == r.metadata.name))
          .map(r, r.?rules.orValue([])
            .filter(ru,
              ru.?apiGroups.orValue([]).exists(g, g == 'security.alauda.io' || g == '*') &&
              ru.?resources.orValue([]).exists(x, x == 'securitycontextconstraints' || x == '*') &&
              ru.?verbs.orValue([]).exists(v, v == 'use' || v == '*'))
            .map(ru,
              ru.?resourceNames.orValue([]).size() == 0
                ? variables.allSccNames
                : ru.resourceNames)
          )
          .flatten()
          .flatten()

    - name: assigned
      expression: |
        (variables.assignedFromClusterRoles + variables.assignedFromRoles)
          .filter(n, variables.allSccNames.exists(s, s == n))

    - name: safeSysctls
      expression: |
        ['kernel.shm_rmid_forced',
         'net.ipv4.ip_local_port_range',
         'net.ipv4.ip_unprivileged_port_start',
         'net.ipv4.tcp_syncookies',
         'net.ipv4.ping_group_range']

    - name: vtypes
      expression: |
        ['hostPath','emptyDir','gcePersistentDisk','awsElasticBlockStore','gitRepo',
         'secret','nfs','iscsi','glusterfs','persistentVolumeClaim','rbd','flexVolume',
         'cinder','cephfs','flocker','downwardAPI','fc','azureFile','configMap',
         'vsphereVolume','quobyte','azureDisk','photonPersistentDisk','projected',
         'portworxVolume','scaleIO','storageos','csi','ephemeral','image']

    - name: ordered
      expression: |
        variables.assigned.sortBy(n,
          int(variables.profiles.filter(pr, pr.metadata.name == n)[?0].orValue({}).?spec.?priority.orValue(0)) * -100000 +
          -int(variables.profiles.filter(pr, pr.metadata.name == n)[?0].orValue({}).?spec.?restrictiveScore.orValue(100))
        )
    - name: requiredExists
      expression: variables.required == '' || variables.profiles.exists(pr, pr.metadata.name == variables.required)
    - name: requiredBound
      expression: variables.required == '' || variables.assigned.exists(n, n == variables.required)
    - name: candidateNames
      expression: |
        variables.required != ''
          ? [variables.required]
          : variables.ordered

    - name: matched
      expression: |
        variables.candidateNames.exists(n,
          cel.bind(p, variables.profiles.filter(pr, pr.metadata.name == n)[?0].orValue({}).?spec.orValue({}),
             (p.?allowPrivilegedContainer.orValue(false)
               || !variables.containers.exists(c, c.?securityContext.?privileged.orValue(false)))
          && (p.?allowPrivilegeEscalation.orValue(true)
               || !variables.containers.exists(c, c.?securityContext.?allowPrivilegeEscalation.orValue(true)))
          && (p.?allowHostNetwork.orValue(false) || !object.spec.?hostNetwork.orValue(false))
          && (p.?allowHostPID.orValue(false)     || !object.spec.?hostPID.orValue(false))
          && (p.?allowHostIPC.orValue(false)     || !object.spec.?hostIPC.orValue(false))
          && (p.?allowHostDirVolumePlugin.orValue(false)
               || !object.spec.?volumes.orValue([]).exists(v, has(v.hostPath)))
          && (
               p.?runAsUser.?type.orValue('RunAsAny') == 'RunAsAny'
               || (
                    (p.?runAsUser.?type.orValue('RunAsAny') in ['MustRunAsNonRoot','MustRunAsNonRootOrSystem'])
                    && !variables.containers.exists(c, c.?securityContext.?runAsUser.orValue(
                         object.spec.?securityContext.?runAsUser.orValue(1)) == 0)
                  )
               || (
                    p.?runAsUser.?type.orValue('RunAsAny') == 'MustRunAs'
                    && variables.containers.all(c, c.?securityContext.?runAsUser.orValue(
                         object.spec.?securityContext.?runAsUser.orValue(1))
                         == p.?runAsUser.?uid.orValue(-1))
                  )
               || (
                    p.?runAsUser.?type.orValue('RunAsAny') == 'MustRunAsRange'
                    && variables.containers.all(c,
                         c.?securityContext.?runAsUser.orValue(
                           object.spec.?securityContext.?runAsUser.orValue(1))
                           >= p.?runAsUser.?uidRangeMin.orValue(1)
                         && c.?securityContext.?runAsUser.orValue(
                              object.spec.?securityContext.?runAsUser.orValue(1))
                           <= p.?runAsUser.?uidRangeMax.orValue(2147483647))
                  )
             )
          && (p.?allowedCapabilities.orValue([]).exists(t, t == '*')
               || variables.containers.all(c,
                    c.?securityContext.?capabilities.?add.orValue([]).all(cap,
                      p.?allowedCapabilities.orValue([]).exists(a, a == cap))))
          && (p.?requiredDropCapabilities.orValue([]).size() == 0
               || variables.containers.all(c,
                    p.?requiredDropCapabilities.orValue([]).all(req,
                      c.?securityContext.?capabilities.?drop.orValue([]).exists(d, d == req || d == 'ALL'))))
          && (p.?volumes.orValue(['*']).exists(t, t == '*')
               || object.spec.?volumes.orValue([]).all(v,
                    variables.vtypes.filter(t, v[?t].hasValue()).all(t,
                      p.?volumes.orValue([]).exists(a, a == t))))
          && (p.?allowHostPorts.orValue(false)
               || variables.containers.all(c,
                    c.?ports.orValue([]).all(port, port.?hostPort.orValue(0) == 0)))
          && (p.?allowedUnsafeSysctls.orValue([]).exists(t, t == '*')
               || object.spec.?securityContext.?sysctls.orValue([]).all(s,
                    variables.safeSysctls.exists(safe, safe == s.name)
                    || p.?allowedUnsafeSysctls.orValue([]).exists(a, a == s.name)))
          && (!p.?readOnlyRootFilesystem.orValue(false)
               || variables.containers.all(c, c.?securityContext.?readOnlyRootFilesystem.orValue(false) == true))
          && (p.?seccompProfiles.orValue([]).size() == 0
               || p.?seccompProfiles.orValue([]).exists(t, t == '*')
               || variables.containers.all(c,
                    p.?seccompProfiles.orValue([]).exists(a,
                      (c.?securityContext.?seccompProfile.?type.orValue(
                         object.spec.?securityContext.?seccompProfile.?type.orValue('')) == 'RuntimeDefault'
                         && a == 'runtime/default')
                      || (c.?securityContext.?seccompProfile.?type.orValue(
                         object.spec.?securityContext.?seccompProfile.?type.orValue('')) == 'Unconfined'
                         && a == 'unconfined')
                      || (c.?securityContext.?seccompProfile.?type.orValue(
                         object.spec.?securityContext.?seccompProfile.?type.orValue('')) == 'Localhost'
                         && a == 'localhost/' + c.?securityContext.?seccompProfile.?localhostProfile.orValue(
                              object.spec.?securityContext.?seccompProfile.?localhostProfile.orValue(''))))))
          && (p.?allowedFlexVolumes.orValue([]).size() == 0
               || object.spec.?volumes.orValue([]).filter(v, v.?flexVolume.hasValue()).all(v,
                    p.?allowedFlexVolumes.orValue([]).exists(d, d.?driver.orValue('') == v.flexVolume.driver)))
          )
        )

  validations:
    - expression: variables.requiredExists
      message: "required-scc does not exist"
      messageExpression: |
        "required SCC '" + variables.required + "' not found in scc-profiles"
    - expression: variables.requiredBound
      message: "required-scc is not bound to ServiceAccount"
      messageExpression: |
        "required SCC '" + variables.required +
        "' is not bound to ServiceAccount '" +
        object.spec.?serviceAccountName.orValue('default') +
        "' in namespace '" + request.namespace + "'"
    - expression: variables.matched
      message: "Pod violates all SCCs assigned to its ServiceAccount"
      messageExpression: |
        variables.required != ''
        ? ("Pod " + object.metadata.name +
           " does not satisfy required SCC '" + variables.required + "'")
        : ("Pod " + object.metadata.name +
           " does not satisfy any SCC profile assigned to ServiceAccount '" +
           object.spec.?serviceAccountName.orValue('default') +
           "' in namespace '" + request.namespace +
           "' (candidates: " + variables.ordered.join(",") + ")")

将以下内容保存为 scc-fill-defaults.yaml 并应用。它是一个 MutatingPolicy,负责在 Pod 上记录所选 SCC(alauda.io/scc 注解),并填充继承自该 SCC 的 runAsUserseccompProfileallowPrivilegeEscalation 默认值。

apiVersion: policies.kyverno.io/v1alpha1
kind: MutatingPolicy
metadata:
  name: scc-fill-defaults
  annotations:
    policies.kyverno.io/title: SCC default value filler (CRD + RBAC, explicit-wins)
    pod-policies.kyverno.io/autogen-controllers: "none"
spec:
  autogen:
    podControllers:
      controllers: []
  evaluation:
    admission:
      enabled: true
  failurePolicy: Fail
  matchConstraints:
    resourceRules:
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["CREATE"]
        resources: ["pods"]
      - apiGroups: [""]
        apiVersions: ["v1"]
        operations: ["UPDATE"]
        resources: ["pods/ephemeralcontainers"]
  matchConditions:
    - name: skip-system-ns
      expression: |
        !(request.namespace.startsWith('kube-') ||
          request.namespace.startsWith('cpaas-') ||
          request.namespace.startsWith('alauda-') ||
          request.namespace == 'kyverno' ||
          request.namespace == 'cattle-system' ||
          request.namespace == 'operators' ||
          request.namespace == 'default')

  variables:
    - name: containers
      expression: |
        object.spec.containers + object.spec.?initContainers.orValue([]) +
        object.spec.?ephemeralContainers.orValue([])
    - name: required
      expression: object.metadata.?annotations[?'alauda.io/required-scc'].orValue('')

    - name: profiles
      expression: |
        cel.bind(items, globalContext.Get('scc-profiles', 'items'),
          items == null ? [] : items)

    - name: subjectMatches
      expression: |
        [
          {'kind':'ServiceAccount',
           'name': string(object.spec.?serviceAccountName.orValue('default')),
           'namespace': string(object.metadata.namespace)},
          {'kind':'Group', 'name':'system:serviceaccounts'},
          {'kind':'Group', 'name':'system:serviceaccounts:'+object.metadata.namespace},
          {'kind':'Group', 'name':'system:authenticated'},
          {'kind':'User',  'name': request.userInfo.username}
        ]
        + request.userInfo.groups.map(g, {'kind':'Group','name': g})
    - name: rolebindings
      expression: |
        cel.bind(rbs, globalContext.Get('scc-rolebindings','items'),
          rbs == null ? [] : rbs)
    - name: matchedClusterRoleRefsFromCRB
      expression: |
        cel.bind(crbs, globalContext.Get('scc-clusterrolebindings','items'),
          crbs == null ? [] : crbs)
        .filter(b, b.?roleRef.?kind.orValue('') == 'ClusterRole'
                && b.?subjects.orValue([]).exists(s,
            variables.subjectMatches.exists(m,
              s.kind == m.kind && s.name == m.name &&
              (s.kind != 'ServiceAccount' ||
               s.?namespace.orValue('') == m.?namespace.orValue('')))))
        .map(b, b.roleRef.name)
    - name: matchedClusterRoleRefsFromRB
      expression: |
        variables.rolebindings
          .filter(b, b.?metadata.?namespace.orValue('') == object.metadata.namespace
                  && b.?roleRef.?kind.orValue('') == 'ClusterRole'
                  && b.?subjects.orValue([]).exists(s,
              variables.subjectMatches.exists(m,
                s.kind == m.kind && s.name == m.name &&
                (s.kind != 'ServiceAccount' ||
                 s.?namespace.orValue('') == m.?namespace.orValue('')))))
          .map(b, b.roleRef.name)
    - name: matchedRoleRefsFromRB
      expression: |
        variables.rolebindings
          .filter(b, b.?metadata.?namespace.orValue('') == object.metadata.namespace
                  && b.?roleRef.?kind.orValue('') == 'Role'
                  && b.?subjects.orValue([]).exists(s,
              variables.subjectMatches.exists(m,
                s.kind == m.kind && s.name == m.name &&
                (s.kind != 'ServiceAccount' ||
                 s.?namespace.orValue('') == m.?namespace.orValue('')))))
          .map(b, b.roleRef.name)
    - name: matchedClusterRoleRefs
      expression: |
        variables.matchedClusterRoleRefsFromCRB + variables.matchedClusterRoleRefsFromRB
    - name: allSccNames
      expression: |
        variables.profiles.map(p, p.metadata.name)
    - name: assignedFromClusterRoles
      expression: |
        cel.bind(crs, globalContext.Get('scc-clusterroles','items'),
          crs == null ? [] : crs)
          .filter(r, variables.matchedClusterRoleRefs.exists(n, n == r.metadata.name))
          .map(r, r.?rules.orValue([])
            .filter(ru,
              ru.?apiGroups.orValue([]).exists(g, g == 'security.alauda.io' || g == '*') &&
              ru.?resources.orValue([]).exists(x, x == 'securitycontextconstraints' || x == '*') &&
              ru.?verbs.orValue([]).exists(v, v == 'use' || v == '*'))
            .map(ru,
              ru.?resourceNames.orValue([]).size() == 0
                ? variables.allSccNames
                : ru.resourceNames)
          )
          .flatten()
          .flatten()

    - name: assignedFromRoles
      expression: |
        cel.bind(roles, globalContext.Get('scc-roles','items'),
          roles == null ? [] : roles)
          .filter(r,
            r.?metadata.?namespace.orValue('') == object.metadata.namespace
            && variables.matchedRoleRefsFromRB.exists(n, n == r.metadata.name))
          .map(r, r.?rules.orValue([])
            .filter(ru,
              ru.?apiGroups.orValue([]).exists(g, g == 'security.alauda.io' || g == '*') &&
              ru.?resources.orValue([]).exists(x, x == 'securitycontextconstraints' || x == '*') &&
              ru.?verbs.orValue([]).exists(v, v == 'use' || v == '*'))
            .map(ru,
              ru.?resourceNames.orValue([]).size() == 0
                ? variables.allSccNames
                : ru.resourceNames)
          )
          .flatten()
          .flatten()

    - name: assigned
      expression: |
        (variables.assignedFromClusterRoles + variables.assignedFromRoles)
          .filter(n, variables.allSccNames.exists(s, s == n))

    - name: safeSysctls
      expression: |
        ['kernel.shm_rmid_forced',
         'net.ipv4.ip_local_port_range',
         'net.ipv4.ip_unprivileged_port_start',
         'net.ipv4.tcp_syncookies',
         'net.ipv4.ping_group_range']
    - name: vtypes
      expression: |
        ['hostPath','emptyDir','gcePersistentDisk','awsElasticBlockStore','gitRepo',
         'secret','nfs','iscsi','glusterfs','persistentVolumeClaim','rbd','flexVolume',
         'cinder','cephfs','flocker','downwardAPI','fc','azureFile','configMap',
         'vsphereVolume','quobyte','azureDisk','photonPersistentDisk','projected',
         'portworxVolume','scaleIO','storageos','csi','ephemeral','image']

    - name: ordered
      expression: |
        variables.assigned.sortBy(n,
          int(variables.profiles.filter(pr, pr.metadata.name == n)[?0].orValue({}).?spec.?priority.orValue(0)) * -100000 +
          -int(variables.profiles.filter(pr, pr.metadata.name == n)[?0].orValue({}).?spec.?restrictiveScore.orValue(100))
        )
    - name: requiredExists
      expression: variables.required == '' || variables.profiles.exists(pr, pr.metadata.name == variables.required)
    - name: requiredBound
      expression: variables.required == '' || variables.assigned.exists(n, n == variables.required)
    - name: candidateNames
      expression: |
        variables.required != ''
          ? ((variables.requiredExists && variables.requiredBound) ? [variables.required] : [])
          : variables.ordered
    - name: isEphemeralSubresource
      expression: request.operation == 'UPDATE'
    - name: annotatedSelectedName
      expression: object.metadata.?annotations[?'alauda.io/scc'].orValue('')

    - name: matchedNames
      expression: |
        variables.candidateNames.filter(n,
          cel.bind(p, variables.profiles.filter(pr, pr.metadata.name == n)[?0].orValue({}).?spec.orValue({}),
            cel.bind(defaultPE, p.?defaultAllowPrivilegeEscalation.orValue(
                  p.?allowPrivilegeEscalation.orValue(true)),
              cel.bind(podRunAsUserForFill,
                    object.spec.?securityContext.?runAsUser.orValue(
                      (p.?runAsUser.?type.orValue('') == 'MustRunAs' && p.?runAsUser.?uid.hasValue())
                        ? p.?runAsUser.?uid.orValue(1)
                        : 1),
                cel.bind(seccompFirstForFill,
                      p.?seccompProfiles.orValue([]).filter(s, s != '' && s != '*')[?0].orValue(''),
                  cel.bind(fillSeccompType,
                        seccompFirstForFill == 'runtime/default' ? 'RuntimeDefault' :
                        seccompFirstForFill.startsWith('localhost/') ? 'Localhost' : '',
                    cel.bind(fillSeccompLocalhost,
                          fillSeccompType == 'Localhost'
                            ? seccompFirstForFill.substring('localhost/'.size()) : '',
                      cel.bind(needPodSeccompFillForMatch,
                            !object.spec.?securityContext.?seccompProfile.hasValue() &&
                            object.spec.containers.all(c, !c.?securityContext.?seccompProfile.hasValue()) &&
                            object.spec.?initContainers.orValue([]).all(c, !c.?securityContext.?seccompProfile.hasValue()),
                        (p.?allowPrivilegedContainer.orValue(false)
                          || !variables.containers.exists(c, c.?securityContext.?privileged.orValue(false)))
                        && (p.?allowPrivilegeEscalation.orValue(true)
                            || !variables.containers.exists(c, c.?securityContext.?allowPrivilegeEscalation.orValue(defaultPE)))
                        && (p.?allowHostNetwork.orValue(false) || !object.spec.?hostNetwork.orValue(false))
                        && (p.?allowHostPID.orValue(false)     || !object.spec.?hostPID.orValue(false))
                        && (p.?allowHostIPC.orValue(false)     || !object.spec.?hostIPC.orValue(false))
                        && (p.?allowHostDirVolumePlugin.orValue(false)
                            || !object.spec.?volumes.orValue([]).exists(v, has(v.hostPath)))
                        && (
                            p.?runAsUser.?type.orValue('RunAsAny') == 'RunAsAny'
                            || (
                                (p.?runAsUser.?type.orValue('RunAsAny') in ['MustRunAsNonRoot','MustRunAsNonRootOrSystem'])
                                && !variables.containers.exists(c, c.?securityContext.?runAsUser.orValue(
                                      podRunAsUserForFill) == 0)
                              )
                            || (
                                p.?runAsUser.?type.orValue('RunAsAny') == 'MustRunAs'
                                && variables.containers.all(c, c.?securityContext.?runAsUser.orValue(
                                      podRunAsUserForFill) == p.?runAsUser.?uid.orValue(-1))
                              )
                            || (
                                p.?runAsUser.?type.orValue('RunAsAny') == 'MustRunAsRange'
                                && variables.containers.all(c,
                                    c.?securityContext.?runAsUser.orValue(podRunAsUserForFill)
                                      >= p.?runAsUser.?uidRangeMin.orValue(1)
                                    && c.?securityContext.?runAsUser.orValue(podRunAsUserForFill)
                                      <= p.?runAsUser.?uidRangeMax.orValue(2147483647))
                              )
                          )
                        && (p.?allowedCapabilities.orValue([]).exists(t, t == '*')
                            || variables.containers.all(c,
                                c.?securityContext.?capabilities.?add.orValue([]).all(cap,
                                  p.?allowedCapabilities.orValue([]).exists(a, a == cap))))
                        && (p.?requiredDropCapabilities.orValue([]).size() == 0
                            || variables.containers.all(c,
                                p.?requiredDropCapabilities.orValue([]).all(req,
                                  c.?securityContext.?capabilities.?drop.orValue([]).exists(d, d == req || d == 'ALL'))))
                        && (p.?volumes.orValue(['*']).exists(t, t == '*')
                            || object.spec.?volumes.orValue([]).all(v,
                                variables.vtypes.filter(t, v[?t].hasValue()).all(t,
                                  p.?volumes.orValue([]).exists(a, a == t))))
                        && (p.?allowHostPorts.orValue(false)
                            || variables.containers.all(c,
                                c.?ports.orValue([]).all(port, port.?hostPort.orValue(0) == 0)))
                        && (p.?allowedUnsafeSysctls.orValue([]).exists(t, t == '*')
                            || object.spec.?securityContext.?sysctls.orValue([]).all(s,
                                variables.safeSysctls.exists(safe, safe == s.name)
                                || p.?allowedUnsafeSysctls.orValue([]).exists(a, a == s.name)))
                        && (!p.?readOnlyRootFilesystem.orValue(false)
                            || variables.containers.all(c, c.?securityContext.?readOnlyRootFilesystem.orValue(false) == true))
                        && (p.?seccompProfiles.orValue([]).size() == 0
                            || p.?seccompProfiles.orValue([]).exists(t, t == '*')
                            || variables.containers.all(c,
                                p.?seccompProfiles.orValue([]).exists(a,
                                  (c.?securityContext.?seccompProfile.?type.orValue(
                                      object.spec.?securityContext.?seccompProfile.?type.orValue(
                                        (needPodSeccompFillForMatch && fillSeccompType != '') ? fillSeccompType : '')) == 'RuntimeDefault'
                                    && a == 'runtime/default')
                                  || (c.?securityContext.?seccompProfile.?type.orValue(
                                      object.spec.?securityContext.?seccompProfile.?type.orValue(
                                        (needPodSeccompFillForMatch && fillSeccompType != '') ? fillSeccompType : '')) == 'Unconfined'
                                    && a == 'unconfined')
                                  || (c.?securityContext.?seccompProfile.?type.orValue(
                                      object.spec.?securityContext.?seccompProfile.?type.orValue(
                                        (needPodSeccompFillForMatch && fillSeccompType != '') ? fillSeccompType : '')) == 'Localhost'
                                    && a == 'localhost/' + c.?securityContext.?seccompProfile.?localhostProfile.orValue(
                                        object.spec.?securityContext.?seccompProfile.?localhostProfile.orValue(
                                          (needPodSeccompFillForMatch && fillSeccompType == 'Localhost')
                                            ? fillSeccompLocalhost : ''))))))
                        && (p.?allowedFlexVolumes.orValue([]).size() == 0
                            || object.spec.?volumes.orValue([]).filter(v, v.?flexVolume.hasValue()).all(v,
                                p.?allowedFlexVolumes.orValue([]).exists(d, d.?driver.orValue('') == v.flexVolume.driver)))
                      )
                    )
                  )
                )
              )
            )
          )
        )

    - name: selectedName
      expression: |
        variables.isEphemeralSubresource
          && variables.annotatedSelectedName != ''
          && variables.candidateNames.exists(n, n == variables.annotatedSelectedName)
          ? variables.annotatedSelectedName
          : variables.matchedNames[?0].orValue('')

    - name: selectedSpec
      expression: |
        variables.profiles.filter(pr, pr.metadata.name == variables.selectedName)[?0].orValue({}).?spec.orValue({})

    - name: defaultPE
      expression: |
        variables.selectedSpec.?defaultAllowPrivilegeEscalation.orValue(
          variables.selectedSpec.?allowPrivilegeEscalation.orValue(true))

    - name: seccompFirst
      expression: |
        variables.selectedSpec.?seccompProfiles.orValue([])
          .filter(s, s != '' && s != '*')[?0].orValue('')
    - name: defaultSeccompType
      expression: |
        variables.seccompFirst == 'runtime/default' ? 'RuntimeDefault' :
        variables.seccompFirst.startsWith('localhost/') ? 'Localhost' : ''
    - name: defaultSeccompLocalhostProfile
      expression: |
        variables.defaultSeccompType == 'Localhost'
          ? variables.seccompFirst.substring('localhost/'.size()) : ''
    - name: needPodSeccomp
      expression: |
        variables.selectedName != '' && variables.defaultSeccompType != '' &&
        !object.spec.?securityContext.?seccompProfile.hasValue() &&
        object.spec.containers.all(c, !c.?securityContext.?seccompProfile.hasValue()) &&
        object.spec.?initContainers.orValue([]).all(c, !c.?securityContext.?seccompProfile.hasValue())

    - name: hasLiteralUid
      expression: |
        variables.selectedName != '' &&
        variables.selectedSpec.?runAsUser.?type.orValue('') == 'MustRunAs' &&
        variables.selectedSpec.?runAsUser.?uid.hasValue()
    - name: literalUid
      expression: |
        variables.hasLiteralUid ? variables.selectedSpec.?runAsUser.?uid.orValue(-1) : -1
    - name: needPodRunAsUser
      expression: |
        variables.hasLiteralUid &&
        !object.spec.?securityContext.?runAsUser.hasValue()

  mutations:
    - patchType: ApplyConfiguration
      applyConfiguration:
        expression: |
          (variables.isEphemeralSubresource || variables.selectedName == '') ? Object{} :
          Object{
            metadata: Object.metadata{
              annotations: {
                "alauda.io/scc": string(variables.selectedName)
              }
            }
          }

    - patchType: ApplyConfiguration
      applyConfiguration:
        expression: |
          (variables.isEphemeralSubresource || !variables.needPodRunAsUser) ? Object{} :
          Object{
            spec: Object.spec{
              securityContext: Object.spec.securityContext{
                runAsUser: variables.literalUid
              }
            }
          }

    - patchType: ApplyConfiguration
      applyConfiguration:
        expression: |
          (variables.isEphemeralSubresource || !variables.needPodSeccomp) ? Object{} :
          (variables.defaultSeccompType == 'Localhost') ?
          Object{
            spec: Object.spec{
              securityContext: Object.spec.securityContext{
                seccompProfile: Object.spec.securityContext.seccompProfile{
                  type: 'Localhost',
                  localhostProfile: variables.defaultSeccompLocalhostProfile
                }
              }
            }
          } :
          Object{
            spec: Object.spec{
              securityContext: Object.spec.securityContext{
                seccompProfile: Object.spec.securityContext.seccompProfile{
                  type: variables.defaultSeccompType
                }
              }
            }
          }

    - patchType: ApplyConfiguration
      applyConfiguration:
        expression: |
          (variables.isEphemeralSubresource || variables.selectedName == '') ? Object{} :
          Object{
            spec: Object.spec{
              containers: object.spec.containers.map(c, Object.spec.containers{
                name: c.name,
                securityContext: Object.spec.containers.securityContext{
                  allowPrivilegeEscalation:
                    c.?securityContext.?allowPrivilegeEscalation.hasValue()
                      ? c.securityContext.allowPrivilegeEscalation
                      : variables.defaultPE
                }
              })
            }
          }

    - patchType: ApplyConfiguration
      applyConfiguration:
        expression: |
          (variables.isEphemeralSubresource || variables.selectedName == '' || !object.spec.?initContainers.hasValue()) ? Object{} :
          Object{
            spec: Object.spec{
              initContainers: object.spec.initContainers.map(c, Object.spec.initContainers{
                name: c.name,
                securityContext: Object.spec.initContainers.securityContext{
                  allowPrivilegeEscalation:
                    c.?securityContext.?allowPrivilegeEscalation.hasValue()
                      ? c.securityContext.allowPrivilegeEscalation
                      : variables.defaultPE
                }
              })
            }
          }

    - patchType: ApplyConfiguration
      applyConfiguration:
        expression: |
          (!variables.isEphemeralSubresource || variables.selectedName == '' || !object.spec.?ephemeralContainers.hasValue()) ? Object{} :
          Object{
            spec: Object.spec{
              ephemeralContainers: object.spec.ephemeralContainers.map(c, Object.spec.ephemeralContainers{
                name: c.name,
                securityContext: Object.spec.ephemeralContainers.securityContext{
                  allowPrivilegeEscalation:
                    c.?securityContext.?allowPrivilegeEscalation.hasValue()
                      ? c.securityContext.allowPrivilegeEscalation
                      : variables.defaultPE
                }
              })
            }
          }

默认情况下,这两个策略都会跳过以下 namespace:以 kube-cpaas-alauda- 开头的 namespace,以及 kyvernocattle-systemoperatorsdefault。如果你的平台使用不同的系统 namespace,请相应调整这两个策略中的 skip-system-ns 表达式。

步骤 1.4 — 使用 Warn → Deny 安全滚动

该 validating policy 以 failurePolicy: FailvalidationActions: [Deny] 交付,这意味着它会立即拒绝不合规的 Pod。在已有集群上,如果未做准备就启用它,可能会破坏那些 ServiceAccount هنوز 尚未绑定任何 SCC 的工作负载。

请采用三阶段滚动方式:

  1. 首次应用前先 Warn。在已有集群上应用 scc-auto-pick.yaml 之前,将 validationActions 改为:

    validationActions:
      - Warn

    然后应用该文件。此时策略会在每个本应被拒绝的准入响应中附加警告,但仍会准入 Pod。查看 Kyverno admission controller 日志以收集受影响的工作负载:

    kubectl logs -n kyverno -l app.kubernetes.io/component=admission-controller \
      --tail=500 | grep -i 'scc-auto-pick'
  2. 修复。对于每个收到警告的工作负载,添加或更正 RBAC 绑定,使其 ServiceAccount 可以 use 合适的 SCC(参见第 2 部分)。使用以下命令确认:

    kubectl auth can-i use \
      securitycontextconstraints.security.alauda.io/<scc-name> \
      --as="system:serviceaccount:<namespace>:<sa-name>" -n <namespace>
  3. Deny。当合法工作负载不再产生警告后,切回 Deny 并重新应用:

    validationActions:
      - Deny
Tip

如果你需要临时豁免整个 namespace,可以将其添加到两个策略中的 skip-system-ns 表达式,或者创建一个 PolicyException 资源。有关 PolicyException 模式,请参见下方的 了解更多

步骤 1.5 — 验证引擎已就绪

运行以下检查。所有资源都应存在,并且两个策略都应为 READY=true

# 1. CRD is established and 13 profiles are loaded
kubectl get crd securitycontextconstraints.security.alauda.io
kubectl get scc

# 2. Five GCE caches exist
kubectl get globalcontextentry scc-profiles scc-clusterroles \
  scc-clusterrolebindings scc-rolebindings scc-roles

# 3. Two admission policies are ready
kubectl get validatingpolicy scc-auto-pick
kubectl get mutatingpolicy scc-fill-defaults

# 4. Reader RBAC is in place
kubectl get clusterrole kyverno-scc-reader
kubectl get clusterrolebinding kyverno-scc-reader

如果 scc-fill-defaults 显示 READY=false,最常见的原因是缺少对 pods/ephemeralcontainers 的读取权限——请确保第 1.3 步中的 kyverno-scc-reader ClusterRole 已完整应用。

第 2 部分:授权工作负载使用 SCC

在引擎安装完成后,默认情况下不会给任何 Pod 授予 SCC。除非管理员为某个 ServiceAccount(或 User,或 Group)创建 RBAC 绑定,否则在非系统 namespace 中以该主体运行的 Pod 会被拒绝,并显示消息 Pod violates all SCCs assigned to its ServiceAccount

请将每个 SCC 绑定视为一个安全授权决策。只有平台管理员或安全管理员才应被授予 SCC 绑定权限;普通应用用户和 namespace 所有者不应能够自行提升其 Pod 权限。

步骤 2.1 — 选择合适的 SCC profile

将工作负载的安全需求与下表进行匹配。默认情况下,引擎先按 priority、再按 restrictiveScore 对已授予的 SCC 排序。选择工作负载所需的最小权限 profile 集合,并在你必须强制使用某个特定 profile 时使用 alauda.io/required-scc

工作负载特征推荐 SCC
无状态服务、非 root、丢弃全部 capabilities、无 host 访问restricted-v2
同上,但需要绑定 <1024 的端口restricted-v2(该 profile 已允许 NET_BIND_SERVICE
同上,但使用固定 UID 范围,如 1000–65534,并使用 user namespacesrestricted-v3
以非 root 用户运行,但无法丢弃全部 capabilities 的服务nonroot-v2(丢弃 ALL)或 nonroot(旧的 drop 集合)
需要以 root 运行的镜像(USER rootanyuid
需要 hostNetwork 和 host ports 的 Ingress controller 或其他 Podhostnetwork-v2(丢弃 ALL)或 hostnetwork(旧的 drop 集合)
挂载 hostPath 用于日志/指标收集、以非 root 运行的服务hostmount-anyuid
同上,但不需要 SELinux relabelinghostmount-anyuid-v2
需要 hostNetworkhostPIDhostIPC 和 host paths 的诊断 Podhostaccess
使用 user namespaces 的 container-in-container 构建沙箱nested-container
完全特权工作负载(CNI、存储驱动、调试 Pod)privileged
Warning

始终授予所需的最小权限。绑定到 privileged 的 ServiceAccount 可以运行任何 Pod,包括那些能够逃逸容器边界的 Pod。请将 privileged 保留给基础设施 DaemonSet,不要授予用户工作负载。

当应用管理者请求 SCC 访问权限时,请提供以下信息:

  • Namespace 和 ServiceAccount,例如 databases/postgres-sa
  • 工作负载名称和控制器类型,例如 StatefulSet/postgres
  • 请求的 SCC 或所需能力,例如镜像以 UID 0 运行,因此需要 anyuid
  • 为什么更严格的 SCC(如 restricted-v2)不足以满足需求。
  • 工作负载是否必须使用 alauda.io/required-scc 固定到某个特定 SCC。

步骤 2.2 — 将 SCC 绑定到 ServiceAccount

最常见的管理员操作,是将 SCC 绑定到工作负载的 ServiceAccount。假设你有一个运行在 databases/postgres-sa 下的应用,并且该镜像以 root(UID 0)运行。你希望该 ServiceAccount 被允许使用 anyuid,同时仍保留 restricted-v2 供更严格的工作负载使用。在这个 root-UID 示例中,restricted-v2 不匹配(runAsUser.uidRangeMin: 1),因此准入会选择 anyuid。更一般地说,当 Pod 同时满足两个 profile 时,本指南的默认 profile 集合会优先选择 anyuid,因为除非你调整优先级或固定 alauda.io/required-scc,否则 anyuidpriority 高于 restricted-v2

将以下内容保存为 bind-postgres-sa.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: scc-use-anyuid-restricted
  labels:
    rbac.alauda.io/scc-use: "true"
rules:
  - apiGroups: ["security.alauda.io"]
    resources: ["securitycontextconstraints"]
    resourceNames: ["anyuid", "restricted-v2"]
    verbs: ["use"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: postgres-sa-scc
  namespace: databases
  labels:
    rbac.alauda.io/scc-use: "true"
subjects:
  - kind: ServiceAccount
    name: postgres-sa
    namespace: databases
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: scc-use-anyuid-restricted

应用:

kubectl apply -f bind-postgres-sa.yaml

rbac.alauda.io/scc-use=true 标签是可选的。它不会影响 SCC 的选择,但你可以使用 kubectl get clusterrole,rolebinding -l rbac.alauda.io/scc-use=true -A 列出所有与 SCC 相关的 RBAC 对象。

Note

你同样可以使用 ClusterRoleBinding 为该 namespaced ServiceAccount 授予集群范围的 use 权限。通常在你希望授权只在一个 namespace 内生效时,使用 namespaced RoleBinding 更清晰。

步骤 2.3 — 将 SCC 绑定到 User

当受信任的人类操作员(以 Kubernetes User 身份认证,例如通过 OIDC 或证书)需要直接启动 Pod 时——例如 SRE 执行 kubectl debugkubectl run——你可以将该 SCC 授予此 User 主体。

保存为 bind-user-sre.yaml,将 [email protected] 替换为你的 User 名称:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: scc-use-hostaccess
  labels:
    rbac.alauda.io/scc-use: "true"
rules:
  - apiGroups: ["security.alauda.io"]
    resources: ["securitycontextconstraints"]
    resourceNames: ["hostaccess"]
    verbs: ["use"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: sre-alice-hostaccess
  labels:
    rbac.alauda.io/scc-use: "true"
subjects:
  - kind: User
    name: [email protected]
    apiGroup: rbac.authorization.k8s.io
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: scc-use-hostaccess

应用:

kubectl apply -f bind-user-sre.yaml

[email protected] 直接运行 kubectl run(而不是通过控制器的 ServiceAccount)时,其创建的 Pod 会以该 User 身份被准入,并获得 hostaccess

步骤 2.4 — 将 SCC 绑定到 Group

Group 绑定适用于管理员管理的全局策略,例如“每个已认证用户都可以运行 restricted-v2 Pod”。两个合成组尤其相关:

  • system:authenticated——每个已认证主体。
  • system:serviceaccounts:<namespace>——某个特定 namespace 中的每个 ServiceAccount。

保存为 bind-group-authenticated.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: scc-use-restricted-v2
  labels:
    rbac.alauda.io/scc-use: "true"
rules:
  - apiGroups: ["security.alauda.io"]
    resources: ["securitycontextconstraints"]
    resourceNames: ["restricted-v2"]
    verbs: ["use"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: scc-use-restricted-v2-authenticated
  labels:
    rbac.alauda.io/scc-use: "true"
subjects:
  - kind: Group
    name: system:authenticated
    apiGroup: rbac.authorization.k8s.io
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: scc-use-restricted-v2
Warning

system:authenticated 组绑定是一个兜底,用于捕获那些其 ServiceAccount 没有显式 SCC 绑定的工作负载。它可作为步骤 1.4 中 Warn 阶段滚动期间的迁移安全网。等每个工作负载都拥有显式绑定后,请移除这个兜底。若将其永久保留,未来如果新增一个带宽松默认值的 SCC profile,你的影响范围会被扩大。

若要将绑定限制为单个 namespace 中的 ServiceAccount,请将 subjects 改为:

subjects:
  - kind: Group
    name: system:serviceaccounts:my-namespace
    apiGroup: rbac.authorization.k8s.io

步骤 2.5 — 使用 alauda.io/required-scc 固定到特定 SCC

默认情况下,引擎会选择主体被允许使用且 Pod 实际满足的最严格 SCC。如果你有一个工作负载必须始终在某个特定 profile 下被准入——例如,一个对审计敏感的部署必须使用 restricted-v3,即使其 ServiceAccount 同时也允许 anyuid——则在 Pod 上设置 alauda.io/required-scc 注解:

apiVersion: v1
kind: Pod
metadata:
  name: audited-app
  namespace: payments
  annotations:
    alauda.io/required-scc: restricted-v3
spec:
  serviceAccountName: payments-sa
  securityContext:
    runAsNonRoot: true
    runAsUser: 1500
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: registry.example.com/payments/audited-app:1.2.3
      securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]

alauda.io/required-scc 注解只会从该主体已经被授权使用的 SCC 中进行选择。它不会授予 SCC 访问权限。要使该注解生效,以下两个条件都必须满足

  • 集群中存在名为 restricted-v3 的 SecurityContextConstraints。
  • payments/payments-sa 已通过授予该资源名 use 权限的 ClusterRole 或 Role 绑定到 restricted-v3

如果任一条件不满足,Pod 会被拒绝。validating policy 会针对每种情况给出具体消息(参见 故障排查)。

使用 PodTemplate 风格的控制器(Deployment、StatefulSet、Job)时,请将该注解放在 Pod template 的 metadata 中,而不是放在控制器上:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: audited-app
  namespace: payments
spec:
  selector:
    matchLabels:
      app: audited-app
  template:
    metadata:
      labels:
        app: audited-app
      annotations:
        alauda.io/required-scc: restricted-v3
    spec:
      serviceAccountName: payments-sa
      # ...

步骤 2.6 — 验证绑定已生效

应用任何绑定后,请运行以下检查。

管理员验证——确认该主体可以 use 该 SCC:

kubectl auth can-i use \
  securitycontextconstraints.security.alauda.io/anyuid \
  --as="system:serviceaccount:databases:postgres-sa" -n databases

预期输出为 yes。如果结果为 no,请重新检查 ClusterRole 中的 apiGroupsresourcesresourceNamesverbs

应用所有者验证——管理员确认绑定后,使用已批准的 ServiceAccount 创建或重新部署工作负载,然后检查被准入的 Pod 注解。快速探测可使用:

kubectl -n databases run probe \
  --image=registry.example.com/library/pause:3.10 \
  --serviceaccount=postgres-sa \
  --overrides='{"spec":{"securityContext":{"runAsUser":999}}}' \
  --command -- /pause

kubectl -n databases get pod probe \
  -o jsonpath='{.metadata.annotations.alauda\.io/scc}{"\n"}'

输出应为引擎所选 SCC 的名称(此示例中为 anyuid)。如果应用所有者无法创建探测 Pod,管理员可以执行此检查,或检查真实工作负载中的某个 Pod。

Note

GlobalContextEntry 会基于 list/watch 进行刷新,并且通常会在几秒内将新绑定传播到准入缓存,在高负载下最多可能需要一分钟。如果在应用新绑定后 Pod 立即被拒绝,请稍等片刻后重试,再判断绑定是否有误。

结果

完成第 1 部分并至少完成一个第 2 部分绑定后,你应当能够验证以下所有内容:

  • kubectl get crd securitycontextconstraints.security.alauda.io 显示该 CRD 的状态为 Established=True
  • kubectl get scc 列出你安装的所有 SCC profile。
  • kubectl get globalcontextentry 返回全部五个 scc-* 条目。
  • kubectl get validatingpolicy scc-auto-pickkubectl get mutatingpolicy scc-fill-defaults 都显示 READY=true
  • 在非系统 namespace 中,由已绑定的 ServiceAccount 创建的 Pod 会收到 alauda.io/scc=<name> 注解,注明引擎所选 SCC。
  • 在非系统 namespace 中,由未绑定的 ServiceAccount 创建的 Pod 会在准入时被拒绝,并显示消息 Pod violates all SCCs assigned to its ServiceAccount

故障排查

使用下表将症状映射到原因和解决步骤。

对于由 Deployment、StatefulSet、Job 和 DaemonSet 等控制器创建的 Pod,实际工作负载身份通常是 Pod 的 ServiceAccount。对于由受信任的人类操作员直接创建的 Pod,例如 kubectl runkubectl debug,User 和 Group SCC 绑定也可以匹配该准入请求。

症状可能原因需要检查什么
Pod violates all SCCs assigned to its ServiceAccount (candidates: ...)Pod 的 ServiceAccount 已绑定至少一个 SCC,但 Pod 的 spec 违反了其中每一个。消息末尾的候选列表给出了被考虑的 SCC。对每个候选 SCC,将 Pod 与该 SCC 的字段进行比较。常见不匹配包括:container runAsUser 超出允许范围、在 SCC 要求 requiredDropCapabilities: [ALL] 时未设置 drop: [ALL]、在 SCC 要求 runtime/default 时未设置 seccompProfile.type
Pod violates all SCCs assigned to its ServiceAccount (candidates: )(候选列表为空)没有 SCC 绑定到该 Pod 的 ServiceAccount。kubectl get scc 中的每个 SCC 名称,运行 kubectl auth can-i use securitycontextconstraints.security.alauda.io/<name> --as=system:serviceaccount:<ns>:<sa> -n <ns>。如果结果全部是 no,请按步骤 2.2 添加绑定。
required SCC '<name>' not found in scc-profilesalauda.io/required-scc 注解引用了一个不存在的 SCC。运行 kubectl get scc <name>。更正该注解,或安装缺失的 profile。
required SCC '<name>' is not bound to ServiceAccount '<sa>' in namespace '<ns>'该注解引用了一个 ServiceAccount 没有 use 权限的 SCC。添加一个 RoleBinding,将 use 授予 securitycontextconstraints/<name> 给该 SA,然后重试。
刚添加绑定后 Pod 仍然被拒绝Kyverno GlobalContextEntry 会异步缓存 RBAC 对象;新绑定需要几秒钟才能传播。等待 10–30 秒后重试。检查 kubectl get globalcontextentry scc-rolebindings -o jsonpath='{.status.lastRefreshTime}{"\n"}' 以确认最近已刷新。
Pod 已被准入,但 runAsUser unexpectedly 被设置 / 未设置mutating policy 根据所选 SCC 填充了默认值,或者因为 Pod 已经声明了该值而没有填充。检查 Pod 上的 alauda.io/scc 注解以确认选择了哪个 SCC,然后查看该 SCC 的 runAsUser.typerunAsUser.uid。Pod 如果自行声明了 runAsUser,则不会被覆盖。
scc-fill-defaultsscc-auto-pick 上显示 READY=falseKyverno 缺少对策略所匹配资源的读取权限(最常见的是 pods/ephemeralcontainers)。重新完整应用第 1.3 步中的 kyverno-scc-reader ClusterRole。
位于 pod-security.kubernetes.io/enforce: restricted namespace 中的 Pod 在 Kyverno 看到之前就被拒绝Kubernetes Pod Security Admission 插件在 Kyverno 之前执行,并独立强制执行 namespace 标签。根据该 namespace 中的工作负载需求,将 namespace 标签放宽到 baselineprivileged,或者限制在那里提供的 SCC。

了解更多

使用 PolicyException 临时绕过策略

当你需要允许某个单独的 ServiceAccount 在短时间内超出其当前 SCC 使用范围(例如紧急调试会话),而修改 RBAC 绑定并不合适时,可以使用 PolicyException 资源。此功能要求 Kyverno admission controller 以 --enablePolicyException=true 启动。

apiVersion: policies.kyverno.io/v1alpha1
kind: PolicyException
metadata:
  name: postgres-debug-bypass
  namespace: policy-exceptions
spec:
  policyRefs:
    - name: scc-auto-pick
      kind: ValidatingPolicy
  matchConditions:
    - name: target-sa
      expression: |
        object.metadata.namespace == 'databases' &&
        object.spec.?serviceAccountName.orValue('') == 'postgres-sa'
    - name: must-be-debug-window
      expression: |
        object.metadata.?labels[?'debug-window'].orValue('') == 'open'

最佳实践:将 PolicyException 资源放在专用 namespace 中(例如 policy-exceptions),并限制写入权限;为每个例外添加 ownerexpire-at 标签,并按周期进行审计。

引擎如何选择 SCC

当某个主体被授予多个 SCC,且 Pod 同时满足其中多个 SCC 时,validating policy 会按以下顺序排序候选项:

  1. 先比较更高的 priority
  2. 再比较更高的 restrictiveScore

Pod 完整满足的第一个候选项就是最终选择的 SCC。mutating policy 在选择要填充默认值的 SCC 时采用相同的排序。这既符合 OpenShift“最严格且可接受的 SCC 胜出”的意图,又允许 operator 通过每个 profile 的 priority 覆盖顺序。

与 OpenShift 命令的映射

如果你来自 OpenShift,下列 oc 命令可直接转换为针对 SCC 引擎的普通 kubectl apply。这些操作会授予 SCC use 权限,应仅由有权更改集群 Pod 安全边界的管理员执行。

OpenShift 命令在本引擎中的等效操作
oc adm policy add-scc-to-user <scc> <user>创建一个 ClusterRole,为 securitycontextconstraints/<scc> 授予 use,然后创建一个 ClusterRoleBinding,subjects: [{kind: User, name: <user>}]
oc adm policy add-scc-to-user <scc> -z <sa> -n <ns>与上面的 ClusterRole 相同,另外在 namespace <ns> 中创建一个 RoleBindingsubjects: [{kind: ServiceAccount, name: <sa>, namespace: <ns>}]
oc adm policy add-scc-to-group <scc> <group>相同的 ClusterRole,另外创建一个 ClusterRoleBinding,subjects: [{kind: Group, name: <group>}]
oc get scckubectl get scc(该 CRD 的 shortNames: [scc] 使命令保持一致)。

下一步

  • 在审查工作负载需求后,确定每个现有 namespace 和 ServiceAccount 应绑定到哪个 SCC,记录映射关系,并通过 GitOps 工作流应用这些绑定,以便其具备可审计性和可重复性。
  • 计划定期审查 PolicyException 资源——它们 предназначены 只用于短时间窗口,而不是永久豁免。
  • 如果你的规模较大,请监控 Kyverno admission controller 的 kyverno_admission_review_duration_seconds 指标,以便在 SCC profile 数量或 RBAC 绑定数量增长时,及时发现准入延迟的变化。