DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai, Keqiang He, An Wang
TL;DR
DYNAMIX tackles the challenge of adaptive batch size selection in heterogeneous distributed machine learning by formulating batch sizing as a sequential decision problem solved with Proximal Policy Optimization. It uses a centralized PPO agent that observes rich multi-domain state information and selects discrete per-node batch-size adjustments to optimize both convergence quality and resource efficiency, guided by a carefully designed reward that balances generalization, efficiency, and gradient stability. The framework integrates with diverse platforms via eBPF-based monitoring, gRPC communication, and BytePS compatibility, and demonstrates robust policy transfer within model families and across frameworks. Empirically, DYNAMIX yields up to 6.3% gains in final accuracy and 46% reductions in training time, scales effectively to 32-node clusters, and maintains low overhead, signaling practical impact for real-world heterogeneous clusters.
Abstract
Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequential decision-making problem using Proximal Policy Optimization (PPO). Our approach employs a multi-dimensional state representation encompassing network-level metrics, system-level resource utilization, and training statistical efficiency indicators to enable informed decision-making across diverse computational resources. Our approach eliminates the need for explicit system modeling while integrating seamlessly with existing distributed training frameworks. Through evaluations across diverse workloads, hardware configurations, and network conditions, DYNAMIX achieves up to 6.3% improvement in the final model accuracy and 46% reduction in the total training time. Our scalability experiments demonstrate that DYNAMIX maintains the best performance as cluster size increases to 32 nodes, while policy transfer experiments show that learned policies generalize effectively across related model architectures.
