Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

Ding Pan; Zhuangzhuang Zhou; Long Qian; Binhang Yuan

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

Ding Pan, Zhuangzhuang Zhou, Long Qian, Binhang Yuan

TL;DR

T Trident, an adaptive scheduling framework for heterogeneous multimodal pipelines on fixed-resource clusters that improves end-to-end throughput by up to 2.01x on a document curation pipeline and 1.88x on a video curation pipeline over a static baseline, with low overhead suitable for online re-optimization.

Abstract

The rapid adoption of large language models and multimodal foundation models has made multimodal data preparation pipelines critical AI infrastructure. These pipelines interleave CPU-heavy preprocessing with accelerator-backed (GPU/NPU/TPU) inference and produce massive intermediate artifacts. Achieving high throughput is difficult because workloads are highly non-stationary: regime shifts, input-dependent inference, and transient memory spikes cause rapid performance fluctuations and out-of-memory (OOM) failures. Existing schedulers typically rely on threshold-based autoscaling or assume synchronous, homogeneous operators, leading to poor efficiency. We present Trident, an adaptive scheduling framework for heterogeneous multimodal pipelines on fixed-resource clusters. Trident closes the loop across three coupled layers: (i) an observation layer that estimates per-operator sustainable throughput for asynchronous operators via Gaussian Process regression with anomaly filtering; (ii) an adaptation layer that detects workload shifts online and performs memory-constrained Bayesian optimization to recommend OOM-safe configurations; and (iii) a scheduling layer that solves a mixed-integer linear program to jointly optimize operator parallelism, placement, and configuration transitions under heterogeneous compute and bandwidth constraints, accounting for cold-start overhead via rolling updates. Decisions trigger sample invalidation and model refresh to keep estimates consistent with the active configuration. Implemented on Ray Data, Trident improves end-to-end throughput by up to 2.01x on a document curation (PDF) pipeline and 1.88x on a video curation pipeline over a static baseline, with low overhead suitable for online re-optimization.

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

TL;DR

Abstract

Paper Structure (34 sections, 19 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 34 sections, 19 equations, 3 figures, 6 tables, 2 algorithms.

Introduction
Background and Motivation
Multimodal Pipeline Characteristics
Offline Resource Constraints
System Overview
System Architecture
Control Flow
Observation Layer
Problem Description
Throughput Modeling by Gaussian Process
Anomaly Detection
Cold Start and Sample Invalidation
Adaptation Layer
Problem Description
Workload Categorization
...and 19 more sections

Figures (3)

Figure 1: Trident system architecture. The metrics collector gathers runtime statistics from operator instances and feeds them to two parallel paths: the observation layer filters anomalies and estimates operator throughput via Gaussian Process regression, while the adaptation layer performs workload clustering and maintains memory-aware configuration recommendations via memory-constrained Bayesian optimization. The scheduling layer integrates capacity estimates and configuration recommendations to solve an MILP that jointly optimizes parallelism, placement, and configuration transitions. Upon committing a configuration transition, the scheduling layer signals the observation layer to invalidate stale samples (path ➈), ensuring capacity estimates remain consistent with the active configuration.
Figure 2: End-to-end throughput comparison on the PDF processing and video curation pipelines. Methods differ in their system-level coverage of capacity estimation, configuration tuning, and resource scheduling (Table \ref{['tab:coverage']}). Trident integrates all three layers in a closed loop. Speedup is reported relative to the Static baseline.
Figure 3: Ablation study. Throughput is normalized to the full Trident system (100%). Removing either layer degrades performance on both pipelines, with the observation layer contributing more in both cases.

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

TL;DR

Abstract

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

Authors

TL;DR

Abstract

Table of Contents

Figures (3)