Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Yuting Yang; Andrea Merlina; Weijia Song; Tiancheng Yuan; Ken Birman; Roman Vitenberg

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Yuting Yang, Andrea Merlina, Weijia Song, Tiancheng Yuan, Ken Birman, Roman Vitenberg

TL;DR

Compass addresses latency-sensitive, DAG-structured ML workflows on edge clusters by co-designing decentralized scheduling and GPU memory caching. It introduces a two-phase planning and dynamic adjustment strategy that accounts for data locality, GPU cache contents, and inter-task dependencies, all within a fully decentralized, Derecho-backed state-sharing fabric. Empirical results show 2x–6x reductions in end-to-end latency with equal or fewer resources, and even half the servers suffice for the same workload, illustrating strong practicality for edge deployments. The approach enables high cache hit rates and efficient use of GPU memory, offering scalable performance across bursty and production-like traces.

Abstract

We consider ML query processing in distributed systems where GPU-enabled workers coordinate to execute complex queries: a computing style often seen in applications that interact with users in support of image processing and natural language processing. In such systems, coscheduling of GPU memory management and task placement represents a promising opportunity. We propose Compass, a novel framework that unifies these functions to reduce job latency while using resources efficiently, placing tasks where data dependencies will be satisfied, collocating tasks from the same job (when this will not overload the host or its GPU), and efficiently managing GPU memory. Comparison with other state of the art schedulers shows a significant reduction in completion times while requiring the same amount or even fewer resources. In one case, just half the servers were needed for processing the same workload.

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

TL;DR

Abstract

Paper Structure (38 sections, 5 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 38 sections, 5 equations, 10 figures, 1 table, 2 algorithms.

Introduction
Deployment Scenarios and Environment
Dataflow Graphs
Deployment Assumptions
Scheduler Objectives
System Architecture
Repository of Workflow Profiles
Scheduler and Task Dispatcher
GPU Memory Manager
Global State Monitor
Compass Scheduler Design
Parameters
Planning Phase
Vertex Ranking
Task Assignments
...and 23 more sections

Figures (10)

Figure 1: Pipelines
Figure 2: Worker Components in Compass
Figure 3: Example of Job Instance Handling
Figure 4: Network Transfer between Nodes
Figure 5: Compass Shared State Table (SST)
...and 5 more figures

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

TL;DR

Abstract

Compass: A Decentralized Scheduler for Latency-Sensitive ML Workflows

Authors

TL;DR

Abstract

Table of Contents

Figures (10)