Flex-MIG: Enabling Distributed Execution on MIG
Myeongsu Kim, Ikjun Yeom, Younghoon Kim
TL;DR
Flex-MIG reframes MIG from a rigid one-to-one hardware partitioning model into a software-managed one-to-many framework that enables a single job to span multiple MIG leaves without draining or reconfiguring GPUs. It introduces two coordinated layers: an orchestration layer that schedules and places multiple MIG leaves per job using size- and topology-aware heuristics, and a runtime layer that enables Host Shared Memory collectives across MIG instances by extending NCCL with MIG-aware peer discovery and synthetic Bus-ID labeling. The approach flattens resource utilization, reduces fragmentation, and avoids disruptive reconfiguration, achieving up to 17% improvements in makespan and higher cluster throughput in trace-driven simulations validated against real measurements. This work demonstrates the practical potential of software-driven resource coordination to unlock efficiency gains in multi-tenant GPU clusters while preserving hardware isolation. Flex-MIG thus offers a scalable path to better MIG utilization for small-to-medium AI workloads in cloud and on-prem environments.
Abstract
GPU clusters in multi-tenant settings often suffer from underutilization, making GPU-sharing technologies essential for efficient resource use. Among them, NVIDIA Multi-Instance GPU (MIG) has gained traction for providing hardware-level isolation that enables concurrent workloads without interference. However, MIG's hardware rigidity and the conventional one-to-one allocation model jointly lead to severe fragmentation and cluster-wide underutilization. We present Flex-MIG, a software-only framework that replaces one-to-one with a one-to-many allocation model and enables host-shared-memory collectives across MIG instances without hardware modification. Flex-MIG eliminates drain-required reconfiguration, reduces fragmentation, and improves makespan by up to 17% across diverse traces, showing that rethinking MIG's operational model as a software-coordinated layer substantially improves cluster efficiency.
