Overcommit Ratio
⚠️ This feature is still experimental. Please use it with caution.
TOC
Understanding the Overcommit Ratio in Hami vGPU
Hami supports configuring a global overcommit ratio for both vGPU compute cores and memory. The purpose of vGPU overcommit ratio is to improve GPU utilization, not to increase resource allocation for individual tasks. The mechanism of vGPU overcommit ratio is only logical in hami-scheduler.
Key Concepts
- NVIDIA Device Core Scaling: Overcommit ratio applied to GPU compute cores.
- NVIDIA Device Memory Scaling: Overcommit ratio applied to GPU memory.
Core Capabilities
- Enable higher GPU utilization, allowing more workloads to share a single GPU card.
Configuring the Overcommit Ratio
- Go to Administrator → Marketplace → Cluster Plugin.
- Switch to the target cluster.
- Update the parameters NVIDIA Device Core Scaling and NVIDIA Device Memory Scaling when deploying or upgrading the Alauda Build of Hami cluster plugin.
Notes
-
vGPU Core Overcommit Ratio
- When the overcommit ratio for GPU cores is greater than 1, multiple workloads may request more than 100% of the GPU compute capacity.
- If all workloads run at full load, they share the physical GPU compute equally (up to their requested share). As a result, each workload may run slower compared to using a dedicated GPU.
- If some workloads are idle, active workloads can utilize the freed capacity.
Example:
- Core overcommit ratio = 2 → one GPU card provides a logical 200% of allocatable cores.
- Four pods request: Pod A = 80%, Pod B = 60%, Pod C = 40%, Pod D = 20%.
- Scenarios:
- If all pods are busy, Pod D receives its requested 20%, while Pods A–C compete for the remaining 80% (≈26.7% each).
- If only Pod A is active, it can utilize up to 80% of the cores.
-
vGPU Memory Overcommit Ratio
- When memory overcommit ratio is enabled, workloads may collectively request more than the physical GPU memory capacity.
- If total requests exceed available memory and all pods attempt to use their full allocation, some workloads may encounter
CUDA out of memoryerrors. - Use memory overcommit ratio with caution, as it can directly lead to application failures.
-
Scope
- The overcommit ratio described here applies only to NVIDIA GPUs.