Table of Contents
Fetching ...

Globus Service Enhancements for Exascale Applications and Facilities

Weijian Zheng, Jack Kordas, Tyler J. Skluzacek, Raj Kettimuthu, Ian Foster

TL;DR

The paper tackles the challenge of moving extremely large data files in exascale workflows where traditional Globus/GridFTP optimizations for many small files are insufficient. It proposes client-side chunking to partition large files across multiple data movers (DTNs) and to overlap transfer with integrity checking via ERET/ESTO; performance is evaluated on three facilities using Lustre file systems. The experiments show substantial gains: up to 9.5× speedups for single large-file transfers, significant reductions in checksum overhead when chunking is used, and large influence of Lustre striping configurations. The work highlights practical implications for exascale data workflows and points to automation and further optimization of integrity checks and storage-system configurations.

Abstract

Many extreme-scale applications require the movement of large quantities of data to, from, and among leadership computing facilities, as well as other scientific facilities and the home institutions of facility users. These applications, particularly when leadership computing facilities are involved, can touch upon edge cases (e.g., terabyte files) that had not been a focus of previous Globus optimization work, which had emphasized rather the movement of many smaller (megabyte to gigabyte) files. We report here on how automated client-driven chunking can be used to accelerate both the movement of large files and the integrity checking operations that have proven to be essential for large data transfers. We present detailed performance studies that provide insights into the benefits of these modifications in a range of file transfer scenarios.

Globus Service Enhancements for Exascale Applications and Facilities

TL;DR

The paper tackles the challenge of moving extremely large data files in exascale workflows where traditional Globus/GridFTP optimizations for many small files are insufficient. It proposes client-side chunking to partition large files across multiple data movers (DTNs) and to overlap transfer with integrity checking via ERET/ESTO; performance is evaluated on three facilities using Lustre file systems. The experiments show substantial gains: up to 9.5× speedups for single large-file transfers, significant reductions in checksum overhead when chunking is used, and large influence of Lustre striping configurations. The work highlights practical implications for exascale data workflows and points to automation and further optimization of integrity checks and storage-system configurations.

Abstract

Many extreme-scale applications require the movement of large quantities of data to, from, and among leadership computing facilities, as well as other scientific facilities and the home institutions of facility users. These applications, particularly when leadership computing facilities are involved, can touch upon edge cases (e.g., terabyte files) that had not been a focus of previous Globus optimization work, which had emphasized rather the movement of many smaller (megabyte to gigabyte) files. We report here on how automated client-driven chunking can be used to accelerate both the movement of large files and the integrity checking operations that have proven to be essential for large data transfers. We present detailed performance studies that provide insights into the benefits of these modifications in a range of file transfer scenarios.

Paper Structure

This paper contains 16 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: A modern data transfer infrastructure connects high-speed storage to wide area networks via a clean, high-bandwidth network path with one or more data transfer nodes (DTNs) hosting Globus Connect agents. The cloud-hosted Globus service acts as a client to the Globus Connect agents, instructing them to perform file transfers in response to user requests.
  • Figure 2: Concurrency (multiple data movers) and parallelism (multiple TCP connections) as implemented in Globus GridFTP.
  • Figure 3: Pipelining in Globus GridFTP. Delays due to waiting for acknowledgements (left) are reduced by sending multiple requests at once (right).
  • Figure 4: A sketch of activity over time for a non-chunked (above) and chunked (below) transfers, both with integrity checking. With 'time' on the horizontal axis, the non-chunked transfer must wait until the entire file is transferred (blue) before performing its integrity check (orange), leading to longer end-to-end times. In the chunked file case, not only do multiple GridFTP processes transfer different portions of a file in parallel, but transfer and integrity checks execute concurrently. (For simplicity, we show the integrity check cost being incurred only after the transfer; in practice, some modest cost also is incurred when first reading the file.)
  • Figure 5: Impact of Lustre stripe count on Globus transfer performance for a 1$\times$2.5 TB file transfer between ALCF (A) and NERSC (N), both with and without chunking. All transfers were conducted without integrity checking.
  • ...and 5 more figures