Isolating Log Components on Kubernetes Infra Nodes
This guide explains how to isolate logging-related infrastructure components on dedicated Kubernetes infra nodes using labels, taints, and node selectors.
TOC
ObjectivesPrerequisitesCheck the Local PVs and nodeAffinityAdd Kafka/ZooKeeper nodes into infra nodesMove Logging Components to Infra NodesElasticsearchKafkaZooKeeperClickHouselanayarazorAny other logging componentEvict non-infra workloads already on infra nodesTroubleshootingObjectives
- Isolate resources: Prevent contention with business workloads.
- Enforce stability: Reduce evictions and scheduling conflicts.
- Simplify management: Centralize infra components with consistent scheduling rules.
Prerequisites
- kubectl is configured against the target cluster.
- Infra components are not bound to nodes via local-PV nodeAffinity, or you have accounted for those nodes (see below).
- Planning the infra nodes by referring to the Cluster Node Planning
Check the Local PVs and nodeAffinity
If your components use local storage (for example TopoLVM, local PV), confirm whether PVs have spec.nodeAffinity. If so, either:
- Add all nodes referenced by
pv.spec.nodeAffinityto the infra node group, or - Redeploy components using a storage class without node affinity (for example Ceph/RBD).
Example (Elasticsearch):
If the PV shows:
Then Elasticsearch data is pinned to node 192.168.135.243. Ensure that node is part of the infra node group, or migrate storage.
Add Kafka/ZooKeeper nodes into infra nodes
Due to historical reasons, ensure Kafka and ZooKeeper nodes are also labeled/tainted as infra:
Move Logging Components to Infra Nodes
ACP logging components tolerate infra taints by default. Use nodeSelector to pin workloads onto infra nodes.
Elasticsearch
Kafka
ZooKeeper
ClickHouse
lanaya
razor
Any other logging component
Evict non-infra workloads already on infra nodes
If some non-infra Pods keep running on infra nodes, trigger a reschedule by updating those workloads (for example, change an annotation) or add/selectors to exclude infra nodes.
Troubleshooting
Common issues and fixes:
Example error:
Fix: add matching tolerations to the workload.