Table of Contents
Fetching ...

Object Proxy Patterns for Accelerating Distributed Applications

J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

TL;DR

The paper addresses data-flow bottlenecks in large-scale distributed applications by introducing three high-level proxy-based patterns—distributed futures (ProxyFutures), object streaming (ProxyStream), and ownership—built on an extended ProxyStore framework. The approach decouples data movement from control flow, enables cross-engine deployment, and provides automated lifecycle management for distributed objects. Key contributions include reference implementations, evaluation on synthetic benchmarks, and demonstrations on three scientific applications: 1000 Genomes, DeepDriveMD, and MOF Generation, with substantial gains in makespan, latency, throughput, and memory efficiency. The work advances portable, scalable, and efficient data-sharing patterns across heterogeneous HPC and cloud environments, with practical implications for accelerating data-intensive workflows.

Abstract

Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area references that can resolve to data regardless of location, has been demonstrated as an effective low-level building block in such situations. Here we propose three high-level proxy-based programming patterns -- distributed futures, streaming, and ownership -- that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage.

Object Proxy Patterns for Accelerating Distributed Applications

TL;DR

The paper addresses data-flow bottlenecks in large-scale distributed applications by introducing three high-level proxy-based patterns—distributed futures (ProxyFutures), object streaming (ProxyStream), and ownership—built on an extended ProxyStore framework. The approach decouples data movement from control flow, enables cross-engine deployment, and provides automated lifecycle management for distributed objects. Key contributions include reference implementations, evaluation on synthetic benchmarks, and demonstrations on three scientific applications: 1000 Genomes, DeepDriveMD, and MOF Generation, with substantial gains in makespan, latency, throughput, and memory efficiency. The work advances portable, scalable, and efficient data-sharing patterns across heterogeneous HPC and cloud environments, with practical implications for accelerating data-intensive workflows.

Abstract

Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area references that can resolve to data regardless of location, has been demonstrated as an effective low-level building block in such situations. Here we propose three high-level proxy-based programming patterns -- distributed futures, streaming, and ownership -- that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage.
Paper Structure (14 sections, 10 figures)

This paper contains 14 sections, 10 figures.

Figures (10)

  • Figure 1: Overview of the three proxy-based data flow patterns we design.
  • Figure 2: Overview of the ProxyStore interface and abstraction stack with our contributions included in the shaded boxes.
  • Figure 3: Four tasks executed in a sequential (above) or pipelined (below) fashion. Each task produces data needed by the following task. The grey region at the start of each task represents startup overhead before the input data can be used. By enabling a successor task to start before its predecessor has finished, futures enable overlapping of startup overhead with computation, a form of pipelining.
  • Figure 4: The StreamProducer abstracts low-level communication details from the StreamConsumer and transparently decouples metadata from bulk data transfer. Yielding proxies, rather than objects directly, in the StreamConsumer enables just-in-time resolution and pass-by-reference optimizations.
  • Figure 5: Results for synthetic benchmark with 8 tasks, each sleeping for 1 s and communicating 10 MB to its successor, and with overhead fraction $f$ determining how much of the 1 s can be overlapped with its predecessor task. (Top) Task execution schedules in four scenarios: sequential no proxy, with delays due to workflow engine submission costs; sequential proxy, with proxies enabling immediate task start after proxy is resolved; and two pipelined ProxyFuture cases ($f=0.2$ and $f=0.5$), in which distributed futures relax strict inter-task dependencies and enable pipelining to overlap initial task overheads. The overhead and compute sleeps dominate in all cases, while times to resolve task input data and receive task results increase, with overhead fraction, while makespan decreases due to pipelining overlap. (Bottom) Synthetic benchmark makespan vs. overhead fraction, for no proxy, proxy, and ProxyFuture scenarios. Each value is averaged over five runs; standard deviations are all less than 20 ms.
  • ...and 5 more figures