Planning Infra Nodes for Logging Storage

This guide explains the planning considerations for running Logging storage plugins on dedicated Kubernetes infra nodes.

Objectives

  • Isolate resources: Prevent contention with business workloads.
  • Enforce stability: Reduce evictions and scheduling conflicts.
  • Simplify management: Centralize infra components with consistent scheduling rules.

Where to Configure Placement

  • For Alauda Container Platform Log Storage for Elasticsearch, configure placement through spec.valuesOverride.ait/chart-alauda-log-center.global.nodeSelector and spec.valuesOverride.ait/chart-alauda-log-center.global.tolerations in Installation.
  • For Alauda Container Platform Log Storage for ClickHouse, configure placement through Advanced Configuration in the console or through spec.config.components.nodeSelector and spec.config.components.tolerations in Installation.

Do not patch the generated StatefulSets, Deployments, or ClickHouseInstallation resources as the standard way to place Logging storage workloads on infra nodes.

Before You Configure Placement

  1. Plan the infra nodes according to Cluster Node Planning.
  2. Confirm whether your storage uses LocalVolume or other PVs with spec.nodeAffinity.
  3. Make sure the selected infra nodes can satisfy both the scheduling rules and the storage placement constraints.

Check Local PVs and nodeAffinity

If your components use local storage (for example TopoLVM, local PV), confirm whether PVs have spec.nodeAffinity. If so, either:

  1. Add all nodes referenced by pv.spec.nodeAffinity to the infra node group, or
  2. Redeploy components using a storage class without node affinity (for example Ceph/RBD).

Example (Elasticsearch):

# 1) Get ES PVCs
kubectl get pvc -n cpaas-system | grep elastic

# 2) Inspect one PV
kubectl get pv elasticsearch-log-node-pv-192.168.135.243 -o yaml

If the PV shows:

spec:
  local:
    path: /cpaas/data/elasticsearch/data
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - 192.168.135.243

Then Elasticsearch data is pinned to node 192.168.135.243. Ensure that node is part of the infra node group, or migrate storage.

The same principle applies to any Logging storage component that uses node-bound local storage.

Historical Kafka and ZooKeeper Nodes

Due to historical reasons, ensure Kafka and ZooKeeper nodes are also labeled/tainted as infra:

kubectl get nodes -l kafka=true
kubectl get nodes -l zk=true
# Add the listed nodes into infra nodes as above

Troubleshooting

Common issues and fixes:

IssueDiagnosisSolution
Pods stuck in Pendingkubectl describe pod <pod> | grep EventsAdd tolerations or adjust selectors
Taint/toleration mismatchkubectl describe node <node> | grep TaintsAdd matching tolerations to the workloads
Resource starvationkubectl top nodes -l node-role.kubernetes.io/infraScale infra nodes or tune resource requests

Example error:

Events:
  Warning  FailedScheduling  2m  default-scheduler  0/3 nodes are available:
  3 node(s) had untolerated taint {node-role.kubernetes.io/infra: true}

Fix: add matching tolerations to the plugin configuration and make sure the selected infra nodes also satisfy the required storage placement constraints.