Table of Contents
Fetching ...

Accelerating Python Applications with Dask and ProxyStore

J. Gregory Pauloski, Klaudiusz Rydzy, Valerie Hayot-Sasson, Ian Foster, Kyle Chard

TL;DR

This work investigates integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows and details the technical contributions necessary to develop a robust solution for distributed applications.

Abstract

Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-reference paradigm, has shown to be an effective mechanism for addressing these limitations. Here, we investigate integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows. Dask provides an easy-to-use and flexible framework, but is less optimized for scaling certain data-intensive workflows. We investigate these limitations and detail the technical contributions necessary to develop a robust solution for distributed applications and demonstrate improved performance on synthetic benchmarks and real applications.

Accelerating Python Applications with Dask and ProxyStore

TL;DR

This work investigates integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows and details the technical contributions necessary to develop a robust solution for distributed applications.

Abstract

Applications are increasingly written as dynamic workflows underpinned by an execution framework that manages asynchronous computations across distributed hardware. However, execution frameworks typically offer one-size-fits-all solutions for data flow management, which can restrict performance and scalability. ProxyStore, a middleware layer that optimizes data flow via an advanced pass-by-reference paradigm, has shown to be an effective mechanism for addressing these limitations. Here, we investigate integrating ProxyStore with Dask Distributed, one of the most popular libraries for distributed computing in Python, with the goal of supporting scalable and portable scientific workflows. Dask provides an easy-to-use and flexible framework, but is less optimized for scaling certain data-intensive workflows. We investigate these limitations and detail the technical contributions necessary to develop a robust solution for distributed applications and demonstrate improved performance on synthetic benchmarks and real applications.

Paper Structure

This paper contains 5 sections, 5 figures.

Figures (5)

  • Figure 1: Pass-by-proxy semantics reduce data flow through the Dask scheduler without altering application behavior.
  • Figure 2: ProxyStore is easily compatible with existing applications. Here we demonstrate the three integration patterns. The DAOSConnector, introduced in \ref{['sec:integration']}, is used, but this specific connector can be exchanged depending on the application requirements and execution environment.
  • Figure 3: (Left) No-op task round-trip time with various payload sizes. (Right) Relative improvement in round-trip time compared to the baseline when using ProxyStore.
  • Figure 4: (Left) No-op task throughput with various worker counts. Tasks consume and produce 1 MB of random data. (Right) Relative improvement in throughput compared to the baseline when using ProxyStore. ProxyStore alleviates data flow burdens from the Dask scheduler, enabling the scheduler to dispatch tasks faster.
  • Figure 5: ProxyStore can reduce Dask overheads applications that embed large objects in the task graph, such as the Cholesky decomposition example and federated learning simulation provided by TaPS.