Table of Contents
Fetching ...

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

Ding Pan, Zhuangzhuang Zhou, Long Qian, Binhang Yuan

TL;DR

T Trident, an adaptive scheduling framework for heterogeneous multimodal pipelines on fixed-resource clusters that improves end-to-end throughput by up to 2.01x on a document curation pipeline and 1.88x on a video curation pipeline over a static baseline, with low overhead suitable for online re-optimization.

Abstract

The rapid adoption of large language models and multimodal foundation models has made multimodal data preparation pipelines critical AI infrastructure. These pipelines interleave CPU-heavy preprocessing with accelerator-backed (GPU/NPU/TPU) inference and produce massive intermediate artifacts. Achieving high throughput is difficult because workloads are highly non-stationary: regime shifts, input-dependent inference, and transient memory spikes cause rapid performance fluctuations and out-of-memory (OOM) failures. Existing schedulers typically rely on threshold-based autoscaling or assume synchronous, homogeneous operators, leading to poor efficiency. We present Trident, an adaptive scheduling framework for heterogeneous multimodal pipelines on fixed-resource clusters. Trident closes the loop across three coupled layers: (i) an observation layer that estimates per-operator sustainable throughput for asynchronous operators via Gaussian Process regression with anomaly filtering; (ii) an adaptation layer that detects workload shifts online and performs memory-constrained Bayesian optimization to recommend OOM-safe configurations; and (iii) a scheduling layer that solves a mixed-integer linear program to jointly optimize operator parallelism, placement, and configuration transitions under heterogeneous compute and bandwidth constraints, accounting for cold-start overhead via rolling updates. Decisions trigger sample invalidation and model refresh to keep estimates consistent with the active configuration. Implemented on Ray Data, Trident improves end-to-end throughput by up to 2.01x on a document curation (PDF) pipeline and 1.88x on a video curation pipeline over a static baseline, with low overhead suitable for online re-optimization.

Trident: Adaptive Scheduling for Heterogeneous Multimodal Data Pipelines

TL;DR

T Trident, an adaptive scheduling framework for heterogeneous multimodal pipelines on fixed-resource clusters that improves end-to-end throughput by up to 2.01x on a document curation pipeline and 1.88x on a video curation pipeline over a static baseline, with low overhead suitable for online re-optimization.

Abstract

The rapid adoption of large language models and multimodal foundation models has made multimodal data preparation pipelines critical AI infrastructure. These pipelines interleave CPU-heavy preprocessing with accelerator-backed (GPU/NPU/TPU) inference and produce massive intermediate artifacts. Achieving high throughput is difficult because workloads are highly non-stationary: regime shifts, input-dependent inference, and transient memory spikes cause rapid performance fluctuations and out-of-memory (OOM) failures. Existing schedulers typically rely on threshold-based autoscaling or assume synchronous, homogeneous operators, leading to poor efficiency. We present Trident, an adaptive scheduling framework for heterogeneous multimodal pipelines on fixed-resource clusters. Trident closes the loop across three coupled layers: (i) an observation layer that estimates per-operator sustainable throughput for asynchronous operators via Gaussian Process regression with anomaly filtering; (ii) an adaptation layer that detects workload shifts online and performs memory-constrained Bayesian optimization to recommend OOM-safe configurations; and (iii) a scheduling layer that solves a mixed-integer linear program to jointly optimize operator parallelism, placement, and configuration transitions under heterogeneous compute and bandwidth constraints, accounting for cold-start overhead via rolling updates. Decisions trigger sample invalidation and model refresh to keep estimates consistent with the active configuration. Implemented on Ray Data, Trident improves end-to-end throughput by up to 2.01x on a document curation (PDF) pipeline and 1.88x on a video curation pipeline over a static baseline, with low overhead suitable for online re-optimization.
Paper Structure (34 sections, 19 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 34 sections, 19 equations, 3 figures, 6 tables, 2 algorithms.

Figures (3)

  • Figure 1: Trident system architecture. The metrics collector gathers runtime statistics from operator instances and feeds them to two parallel paths: the observation layer filters anomalies and estimates operator throughput via Gaussian Process regression, while the adaptation layer performs workload clustering and maintains memory-aware configuration recommendations via memory-constrained Bayesian optimization. The scheduling layer integrates capacity estimates and configuration recommendations to solve an MILP that jointly optimizes parallelism, placement, and configuration transitions. Upon committing a configuration transition, the scheduling layer signals the observation layer to invalidate stale samples (path ➈), ensuring capacity estimates remain consistent with the active configuration.
  • Figure 2: End-to-end throughput comparison on the PDF processing and video curation pipelines. Methods differ in their system-level coverage of capacity estimation, configuration tuning, and resource scheduling (Table \ref{['tab:coverage']}). Trident integrates all three layers in a closed loop. Speedup is reported relative to the Static baseline.
  • Figure 3: Ablation study. Throughput is normalized to the full Trident system (100%). Removing either layer degrades performance on both pipelines, with the observation layer contributing more in both cases.