Table of Contents
Fetching ...

JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows

Vladislav Esaulov, Jieyang Chen, Norbert Podhorszki, Fred Suter, Scott Klasky, Anu G Bourgeois, Lipeng Wan

TL;DR

The paper tackles the challenge of transferring massive scientific datasets across wide-area networks for cross-facility workflows, where TCP-based transfers incur high latency and inefficiency. It introduces JANUS, a UDP-based, erasure-coded, error-bounded lossy compression pipeline that leverages multilevel data refactoring (pMGARD) to enable progressive reconstruction and adjustable redundancy. It formalizes two optimization models to minimize transmission time under error bounds and to minimize error under time constraints, and develops adaptive transfer protocols that dynamically adjust erasure coding in real time. Through extensive simulations and real-network experiments with Nyx-derived data, JANUS demonstrates superior transfer efficiency and data fidelity, offering a practical path toward timely, scalable data sharing in large-scale scientific collaborations.

Abstract

In modern science, the growing complexity of large-scale scientific projects has led to an increasing reliance on cross-facility scientific workflows, where resources and expertise from multiple institutions and geographic locations are leveraged to accelerate scientific discovery. These workflows often require transmitting huge amounts of scientific data through wide-area networks. Although high-speed networks like ESnet and transfer services such as Globus have improved data mobility, several challenges remain. The sheer volume of data can overwhelm network bandwidth, widely used transport protocols such as TCP suffer from inefficiencies due to retransmissions triggered by packet loss, and existing fault-tolerance mechanisms like erasure coding introduce substantial overhead. In this paper, we propose JANUS, a resilient and adaptable data transmission approach designed for cross-facility scientific workflows. Unlike traditional TCP-based methods, JANUSleverages UDP, integrates erasure coding for fault tolerance, and combines it with error-bounded lossy compression to reduce overhead. This novel design allows users to balance data transmission time and accuracy, optimizing transfer performance based on specific scientific requirements. Additionally, JANUS dynamically adjusts erasure coding parameters in response to real-time network conditions, ensuring efficient data transfers even in fluctuating environments. We develop optimization models for determining ideal configurations and implement adaptive data transfer protocols to enhance reliability. Through extensive simulations and real-network experiments, we demonstrate that JANUS significantly improves transfer efficiency while maintaining data fidelity.

JANUS: Resilient and Adaptive Data Transmission for Enabling Timely and Efficient Cross-Facility Scientific Workflows

TL;DR

The paper tackles the challenge of transferring massive scientific datasets across wide-area networks for cross-facility workflows, where TCP-based transfers incur high latency and inefficiency. It introduces JANUS, a UDP-based, erasure-coded, error-bounded lossy compression pipeline that leverages multilevel data refactoring (pMGARD) to enable progressive reconstruction and adjustable redundancy. It formalizes two optimization models to minimize transmission time under error bounds and to minimize error under time constraints, and develops adaptive transfer protocols that dynamically adjust erasure coding in real time. Through extensive simulations and real-network experiments with Nyx-derived data, JANUS demonstrates superior transfer efficiency and data fidelity, offering a practical path toward timely, scalable data sharing in large-scale scientific collaborations.

Abstract

In modern science, the growing complexity of large-scale scientific projects has led to an increasing reliance on cross-facility scientific workflows, where resources and expertise from multiple institutions and geographic locations are leveraged to accelerate scientific discovery. These workflows often require transmitting huge amounts of scientific data through wide-area networks. Although high-speed networks like ESnet and transfer services such as Globus have improved data mobility, several challenges remain. The sheer volume of data can overwhelm network bandwidth, widely used transport protocols such as TCP suffer from inefficiencies due to retransmissions triggered by packet loss, and existing fault-tolerance mechanisms like erasure coding introduce substantial overhead. In this paper, we propose JANUS, a resilient and adaptable data transmission approach designed for cross-facility scientific workflows. Unlike traditional TCP-based methods, JANUSleverages UDP, integrates erasure coding for fault tolerance, and combines it with error-bounded lossy compression to reduce overhead. This novel design allows users to balance data transmission time and accuracy, optimizing transfer performance based on specific scientific requirements. Additionally, JANUS dynamically adjusts erasure coding parameters in response to real-time network conditions, ensuring efficient data transfers even in fluctuating environments. We develop optimization models for determining ideal configurations and implement adaptive data transfer protocols to enhance reliability. Through extensive simulations and real-network experiments, we demonstrate that JANUS significantly improves transfer efficiency while maintaining data fidelity.

Paper Structure

This paper contains 25 sections, 11 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: An illustration of data transfer with Janus
  • Figure 2: Total time for transferring data with guaranteed error bound under different packet loss rates
  • Figure 3: Error bounds of data received within guaranteed transmission time under different packet loss rates
  • Figure 4: Total time for transferring data with guaranteed error bound using different data transfer protocols
  • Figure 5: Error bounds of data received within guaranteed transmission time