⚠️ This feature is still experimental. Please use it with caution.
Enable dynamic MIG feature
HAMi now supports dynamic MIG using mig-parted to adjust MIG devices dynamically, including:
-
Dynamic MIG Instance Management: Users no longer need to operate directly on GPU nodes or use commands like
nvidia-smi -i 0 -mig 1to manage MIG instances. HAMi-device-plugin will handle this automatically. -
Dynamic MIG Adjustment: Each MIG device managed by HAMi will dynamically adjust its MIG template according to the jobs submitted, as needed.
-
Device MIG Observation: Each MIG instance generated by HAMi will be displayed in the scheduler monitor, along with job information, providing a clear overview of MIG nodes.
-
Compatibility with HAMi-Core Nodes: HAMi can manage a unified GPU pool across both HAMi-core nodes and MIG nodes. A job can be scheduled to either node unless manually specified using the
nvidia.com/vgpu-modeannotation. -
Unified API with HAMi-Core: No additional work is required to make jobs compatible with the dynamic MIG feature.
TOC
Prerequisites
- NVIDIA Blackwell, Hopper™, and Ampere GPUs
- Alauda Build of Hami Installed
Enable dynamic MIG support
- Configure mode in device-plugin configMap to mig for MIG nodes
Replace the node name in
nodeconfigarray with the node name for which you want to set the mig mode. If there are multiple nodes, increase the number of array elements. - Restart the following pods for the change to take effect:
- hami-scheduler
- hami-device-plugin on node 'MIG-NODE-A'
Note: The above configuration will be lost during upgrades; future versions of Hami will improve this.
Custom MIG configuration (optional)
HAMi currently has a built-in MIG configuration for MIG.
You can customize the MIG configuration by following the steps below:
Then restart the hami-scheduler components. HAMi will identify and use the first MIG template that matches the job, in the order defined in this configMap.
Note: The above configuration will be lost during upgrades; future versions of Hami will improve this.
Running MIG jobs
A MIG instance can now be requested by a container in the same way as hami-core, simply by specifying the nvidia.com/gpualloc and nvidia.com/gpumem resource types.
Note:
- The number of
nvidia.com/gpualloccannot exceed the actual number of physical GPUs. For example, a single MiG mode GPU can only be set to 1. This is a limitation of Hami and will be improved in future versions. - No action is required on MIG nodes — everything is managed by mig-parted in hami-device-plugin.
- NVIDIA devices older than the Ampere architecture do not support MIG mode.
- MIG resources (example:
nvidia.com/mig-1g.10gb) will not be visible on the node. HAMi uses a unified resource name for both MIG and hami-core nodes. - The
DCGM-exportercomponent deployed on MIG nodes must be stopped when performing MIG partitioning, because MIG partitioning requires resetting the GPU. After the first MIG-enabled workload is created, automatic MIG partitioning is performed. Subsequent workloads will not trigger further partitioning. When all workloads stop, starting the first workload again will trigger MIG partitioning once more.