Optimizing Data Transfer Performance and Energy Efficiency with Deep Reinforcement Learning
Hasubil Jamil, Jacob Goldverg, Elvis Rodrigues, MD S Q Zulkar Nine, Tevfik Kosar
TL;DR
This work tackles the challenge of high-throughput, energy-efficient data transfers over shared networks. It introduces SPARTA, a multi-parameter DRL framework that tunes concurrency $cc$ and parallelism $p$ using reward signals that balance throughput, energy, and fairness, and it can pause/resume transfers to adapt to traffic conditions. An emulated training environment accelerates learning by reusing real-world transition logs, enabling rapid convergence across multiple DRL algorithms. Empirical results show up to 25% throughput gains and up to 40% end-system energy reductions relative to baselines, with SPARTA-FE achieving stronger fairness and SPARTA-T delivering favorable throughput-energy trade-offs. The approach demonstrates robust performance across CloudLab, Chameleon, and FABRIC, offering practical benefits for sustainable, high-performance data movement in shared network settings and suggesting pathways for scaling to larger, multi-agent deployments and additional transport protocols.
Abstract
The rapid growth of data across fields of science and industry has increased the need to improve the performance of end-to-end data transfers while using the resources more efficiently. In this paper, we present a dynamic, multiparameter reinforcement learning (RL) framework that adjusts application-layer transfer settings during data transfers on shared networks. Our method strikes a balance between high throughput and low energy utilization by employing reward signals that focus on both energy efficiency and fairness. The RL agents can pause and resume transfer threads as needed, pausing during heavy network use and resuming when resources are available, to prevent overload and save energy. We evaluate several RL techniques and compare our solution with state-of-the-art methods by measuring computational overhead, adaptability, throughput, and energy consumption. Our experiments show up to 25% increase in throughput and up to 40% reduction in energy usage at the end systems compared to baseline methods, highlighting a fair and energy-efficient way to optimize data transfers in shared network environments.
