Table of Contents
Fetching ...

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

Yuanjun Dai, Keqiang He, An Wang

TL;DR

DYNAMIX tackles the challenge of adaptive batch size selection in heterogeneous distributed machine learning by formulating batch sizing as a sequential decision problem solved with Proximal Policy Optimization. It uses a centralized PPO agent that observes rich multi-domain state information and selects discrete per-node batch-size adjustments to optimize both convergence quality and resource efficiency, guided by a carefully designed reward that balances generalization, efficiency, and gradient stability. The framework integrates with diverse platforms via eBPF-based monitoring, gRPC communication, and BytePS compatibility, and demonstrates robust policy transfer within model families and across frameworks. Empirically, DYNAMIX yields up to 6.3% gains in final accuracy and 46% reductions in training time, scales effectively to 32-node clusters, and maintains low overhead, signaling practical impact for real-world heterogeneous clusters.

Abstract

Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequential decision-making problem using Proximal Policy Optimization (PPO). Our approach employs a multi-dimensional state representation encompassing network-level metrics, system-level resource utilization, and training statistical efficiency indicators to enable informed decision-making across diverse computational resources. Our approach eliminates the need for explicit system modeling while integrating seamlessly with existing distributed training frameworks. Through evaluations across diverse workloads, hardware configurations, and network conditions, DYNAMIX achieves up to 6.3% improvement in the final model accuracy and 46% reduction in the total training time. Our scalability experiments demonstrate that DYNAMIX maintains the best performance as cluster size increases to 32 nodes, while policy transfer experiments show that learned policies generalize effectively across related model architectures.

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

TL;DR

DYNAMIX tackles the challenge of adaptive batch size selection in heterogeneous distributed machine learning by formulating batch sizing as a sequential decision problem solved with Proximal Policy Optimization. It uses a centralized PPO agent that observes rich multi-domain state information and selects discrete per-node batch-size adjustments to optimize both convergence quality and resource efficiency, guided by a carefully designed reward that balances generalization, efficiency, and gradient stability. The framework integrates with diverse platforms via eBPF-based monitoring, gRPC communication, and BytePS compatibility, and demonstrates robust policy transfer within model families and across frameworks. Empirically, DYNAMIX yields up to 6.3% gains in final accuracy and 46% reductions in training time, scales effectively to 32-node clusters, and maintains low overhead, signaling practical impact for real-world heterogeneous clusters.

Abstract

Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequential decision-making problem using Proximal Policy Optimization (PPO). Our approach employs a multi-dimensional state representation encompassing network-level metrics, system-level resource utilization, and training statistical efficiency indicators to enable informed decision-making across diverse computational resources. Our approach eliminates the need for explicit system modeling while integrating seamlessly with existing distributed training frameworks. Through evaluations across diverse workloads, hardware configurations, and network conditions, DYNAMIX achieves up to 6.3% improvement in the final model accuracy and 46% reduction in the total training time. Our scalability experiments demonstrate that DYNAMIX maintains the best performance as cluster size increases to 32 nodes, while policy transfer experiments show that learned policies generalize effectively across related model architectures.

Paper Structure

This paper contains 24 sections, 6 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: System Architecture
  • Figure 2: Baseline performance with fixed batch sizes
  • Figure 3: Average and median accumulative rewards
  • Figure 4: Accuracy trajectories during target model training
  • Figure 5: Batch size adjustments during target model training
  • ...and 1 more figures