CARMA: Collocation-Aware Resource Manager

Ehsan Yousefzadeh-Asl-Miandoab; Reza Karimzadeh; Bulat Ibragimov; Florina M. Ciorba; Pınar Tözün

CARMA: Collocation-Aware Resource Manager

Ehsan Yousefzadeh-Asl-Miandoab, Reza Karimzadeh, Bulat Ibragimov, Florina M. Ciorba, Pınar Tözün

TL;DR

CARMA tackles GPU underutilization in deep learning training by enabling task-level collocation with careful OOM and interference control. It combines fine-grained GPU telemetry, memory-need estimators, risk-based placement policies, and a lightweight OOM-recovery mechanism to improve utilization while maintaining QoS. The approach yields significant gains in SM utilization, memory efficiency, and throughput, with meaningful reductions in makespan and energy consumption on production-like traces. Its server-scale focus and recovery-first design make collocation more robust for real-world DL workloads, and the framework lays groundwork for broader adoption and future enhancements in multi-server settings and inference workloads.

Abstract

GPUs running deep learning (DL) workloads are frequently underutilized. Collocating multiple DL training tasks on the same GPU can improve utilization but introduces two key risks: (1) out-of-memory (OOM) crashes for newly scheduled tasks, and (2) severe performance interference among co-running tasks, which can negate any throughput gains. These issues reduce system robustness, quality of service, and energy efficiency. We present CARMA, a task-level, collocation-aware resource management system for the server-scale. CARMA addresses collocation challenges via (1) fine-grained monitoring and bookkeeping of GPUs and a collocation risk analysis that filters out the high-risk GPUs; (2) task placement policies that cap GPU utilization to avoid OOMs and limit interference; (3) integration of GPU memory need estimators for DL tasks to minimize OOMs during collocation; and (4) a lightweight recovery method that relaunches jobs crashed due to OOMs. Our evaluation on a DL training workload derived from real-world traces shows that CARMA uses GPUs more efficiently by making more informed collocation decisions: for the best-performing collocation policy, CARMA increases GPU streaming multiprocessor (SM) utilization by 54%, the parallelism achieved per SM by 61%, and memory use by 62%. This results in a $\sim$35% and $\sim$15% reduction in the end-to-end execution time (makespan) and GPU energy consumption, respectively, for this workload.

CARMA: Collocation-Aware Resource Manager

TL;DR

Abstract

CARMA: Collocation-Aware Resource Manager

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (13)