Table of Contents
Fetching ...

Modular Architecture for High-Performance and Low Overhead Data Transfers

Rasman Mubtasim Swargo, Engin Arslan, Md Arifuzzaman

TL;DR

The paper addresses the challenge of rapidly moving massive datasets over high-bandwidth, geographically distributed networks where conditions are dynamic. It introduces a modular data transfer architecture (AutoMDT) that jointly optimizes three concurrency dimensions—read, network, and write—via a policy-driven DRL approach (PPO) trained offline using a dedicated memory-buffer dynamics simulator. The approach yields up to 8x faster convergence and up to 68% shorter transfer times compared with state-of-the-art baselines, demonstrated on production-grade testbeds (CloudLab and Fabric). The offline simulator enables rapid training (about 45 minutes) and provides a practical path to stable, high-performance data transfers in real-world HPC environments without modifying kernel or transport-layer configurations.

Abstract

High-performance applications necessitate rapid and dependable transfer of massive datasets across geographically dispersed locations. Traditional file transfer tools often suffer from resource underutilization and instability because of fixed configurations or monolithic optimization methods. We propose AutoMDT, a novel modular data transfer architecture that employs a deep reinforcement learning based agent to simultaneously optimize concurrency levels for read, network, and write operations. Our solution incorporates a lightweight network-system simulator, enabling offline training of a Proximal Policy Optimization (PPO) agent in approximately 45 minutes on average, thereby overcoming the impracticality of lengthy online training in production networks. AutoMDT's modular design decouples I/O and network tasks, allowing the agent to capture complex buffer dynamics precisely and to adapt quickly to changing system and network conditions. Evaluations on production-grade testbeds show that AutoMDT achieves up to 8x faster convergence and a 68% reduction in transfer completion times compared with state-of-the-art solutions.

Modular Architecture for High-Performance and Low Overhead Data Transfers

TL;DR

The paper addresses the challenge of rapidly moving massive datasets over high-bandwidth, geographically distributed networks where conditions are dynamic. It introduces a modular data transfer architecture (AutoMDT) that jointly optimizes three concurrency dimensions—read, network, and write—via a policy-driven DRL approach (PPO) trained offline using a dedicated memory-buffer dynamics simulator. The approach yields up to 8x faster convergence and up to 68% shorter transfer times compared with state-of-the-art baselines, demonstrated on production-grade testbeds (CloudLab and Fabric). The offline simulator enables rapid training (about 45 minutes) and provides a practical path to stable, high-performance data transfers in real-world HPC environments without modifying kernel or transport-layer configurations.

Abstract

High-performance applications necessitate rapid and dependable transfer of massive datasets across geographically dispersed locations. Traditional file transfer tools often suffer from resource underutilization and instability because of fixed configurations or monolithic optimization methods. We propose AutoMDT, a novel modular data transfer architecture that employs a deep reinforcement learning based agent to simultaneously optimize concurrency levels for read, network, and write operations. Our solution incorporates a lightweight network-system simulator, enabling offline training of a Proximal Policy Optimization (PPO) agent in approximately 45 minutes on average, thereby overcoming the impracticality of lengthy online training in production networks. AutoMDT's modular design decouples I/O and network tasks, allowing the agent to capture complex buffer dynamics precisely and to adapt quickly to changing system and network conditions. Evaluations on production-grade testbeds show that AutoMDT achieves up to 8x faster convergence and a 68% reduction in transfer completion times compared with state-of-the-art solutions.

Paper Structure

This paper contains 21 sections, 8 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Dynamics of the file transfer process showing the relationship between read, network, and write throughputs.
  • Figure 2: AutoMDT introduces offline training of a deep reinforcement learning agent to quickly learn the behavior and memory buffer dynamics of the real environment.
  • Figure 3: Performance comparison of AutoMDT and Marlin in Fabric-testbed. Marlin takes $\sim$1.7× longer time than AutoMDT to finish the transfer.
  • Figure 4: PPO agent with discrete action space failed to achieve convergence.
  • Figure 5: Performance comparisons of AutoMDT (first row) and Marlin (second row). AutoMDT leverages joint-optimization and the memory buffer dynamics to quickly identify the bottleneck component, then increase concurrency accordingly while maintaining low value for other components. It reaches the optimal solution faster, resulting in improved throughput (third row) and better resource utilization compared to Marlin.