SMDP-Based Dynamic Batching for Improving Responsiveness and Energy Efficiency of Batch Services
Yaodan Xu, Sheng Zhou, Zhisheng Niu
TL;DR
The paper addresses dynamic batching for online batch-serving with size-dependent service times by formulating the problem as an infinite-state SMDP that minimizes the weighted sum of latency and energy, under Poisson arrivals. It introduces a finite-state approximation with an abstract tail cost, a discretization step to a DTMDP, and a relative value iteration solver, enabling tractable offline computation of near-optimal batching policies. Key contributions include a rigorous SMDP formulation for batch-service queues with size-dependent processing, substantial reductions in computational complexity from tail abstraction (e.g., up to 63.5% space and 98% time), and extensive numerical results showing that SMDP-derived policies achieve superior latency-energy tradeoffs and lighter tail latency compared to benchmark batching schemes. The findings demonstrate practical impact for ML inference serving and online computing, offering a flexible framework to balance responsiveness and energy efficiency in batch-enabled servers, with clear avenues for extension to multi-processor systems and bursty traffic regimes.
Abstract
For servers incorporating parallel computing resources, batching is a pivotal technique for providing efficient and economical services at scale. Parallel computing resources exhibit heightened computational and energy efficiency when operating with larger batch sizes. However, in the realm of online services, the adoption of a larger batch size may lead to longer response times. This paper aims to provide a dynamic batching scheme that delicately balances latency and efficiency. The system is modeled as a batch service queue with size-dependent service times. Then, the design of dynamic batching is formulated as a semi-Markov decision process (SMDP) problem, with the objective of minimizing the weighted sum of average response time and average power consumption. A method is proposed to derive an approximate optimal SMDP solution, representing the chosen dynamic batching policy. By introducing an abstract cost to reflect the impact of "tail" states, the space complexity and the time complexity of the procedure can decrease by 63.5% and 98%, respectively. Numerical results showcase the superiority of SMDP-based batching policies across various parameter setups. Additionally, the proposed scheme exhibits noteworthy flexibility in balancing power consumption and latency.
