Table of Contents
Fetching ...

Using Process Calculus for Optimizing Data and Computation Sharing in Complex Stateful Parallel Computations

Zilu Tian, Dan Olteanu, Christoph Koch

TL;DR

This paper tackles the challenge of accelerating complex stateful parallel computations, notably agent-based simulations, by introducing behavioral equations built on the $\pi$-calculus within a BSP setting. It presents OptiFusion, a compile-time program specialization framework that transforms a simple, generic agent program into highly optimized, partition-aware implementations through a sequence of rewriting steps and annotations for data and computation placement. The approach yields substantial performance improvements, achieving up to $10\times$ faster execution compared with stateful baselines and up to $2\times$ the speed of hand-optimized codes, while also enabling data sharing and computation sharing that are impractical to hand-write. These results demonstrate a principled, formal foundation for parallel program optimization with strong practical impact for large-scale, complex stateful workloads.

Abstract

We propose novel techniques that exploit data and computation sharing to improve the performance of complex stateful parallel computations, like agent-based simulations. Parallel computations are translated into behavioral equations, a novel formalism layered on top of the foundational process calculus $π$-calculus. Behavioral equations blend code and data, allowing a system to easily compose and transform parallel programs into specialized programs. We show how optimizations like merging programs, synthesizing efficient message data structures, eliminating local messaging, rewriting communication instructions into local computations, and {aggregation pushdown} can be expressed as transformations of behavioral equations. We have also built a system called OptiFusion that implements behavioral equations and the aforementioned optimizations. Our experiments showed that OptiFusion is over 10$\times$ faster than state-of-the-art stateful systems benchmarked via complex stateful workloads. Generating specialized instructions that are impractical to write by hand allows OptiFusion to outperform even the hand-optimized implementations by up to 2$\times$.

Using Process Calculus for Optimizing Data and Computation Sharing in Complex Stateful Parallel Computations

TL;DR

This paper tackles the challenge of accelerating complex stateful parallel computations, notably agent-based simulations, by introducing behavioral equations built on the -calculus within a BSP setting. It presents OptiFusion, a compile-time program specialization framework that transforms a simple, generic agent program into highly optimized, partition-aware implementations through a sequence of rewriting steps and annotations for data and computation placement. The approach yields substantial performance improvements, achieving up to faster execution compared with stateful baselines and up to the speed of hand-optimized codes, while also enabling data sharing and computation sharing that are impractical to hand-write. These results demonstrate a principled, formal foundation for parallel program optimization with strong practical impact for large-scale, complex stateful workloads.

Abstract

We propose novel techniques that exploit data and computation sharing to improve the performance of complex stateful parallel computations, like agent-based simulations. Parallel computations are translated into behavioral equations, a novel formalism layered on top of the foundational process calculus -calculus. Behavioral equations blend code and data, allowing a system to easily compose and transform parallel programs into specialized programs. We show how optimizations like merging programs, synthesizing efficient message data structures, eliminating local messaging, rewriting communication instructions into local computations, and {aggregation pushdown} can be expressed as transformations of behavioral equations. We have also built a system called OptiFusion that implements behavioral equations and the aforementioned optimizations. Our experiments showed that OptiFusion is over 10 faster than state-of-the-art stateful systems benchmarked via complex stateful workloads. Generating specialized instructions that are impractical to write by hand allows OptiFusion to outperform even the hand-optimized implementations by up to 2.

Paper Structure

This paper contains 42 sections, 16 equations, 13 figures.

Figures (13)

  • Figure 1: The performance gap between stateless parallel systems (Spark and GraphX) and stateful parallel systems (Flink, Giraph, and CloudCity) when benchmarked using complex stateful workloads remains. The workloads are simulations for population dynamics, economics, and epidemics where the social graph is generated from the Erdős-Rényi random graph model (abbreviated ERM) and the stochastic block random model (abbreviated SBM) respectively. These experiments reproduce the scale-up results in tian23generalizing using the latest software versions. By exploiting data and computation sharing optimizations, OptiFusion is over 500$\times$ faster than stateless systems and 10$\times$ faster than stateful systems. The performance of OptiFusion is comparable with and can even be better than the hand-optimized implementations by 2$\times$.
  • Figure 2: Structural congruence rules for $\pi$-calculus.
  • Figure 3: Illustration of how various optimizations transform computation trees.
  • Figure 4: System architecture of OptiFusion.
  • Figure 5: Agent definition for population dynamics example using $\mathtt{BSP}$ and $\mathtt{ComputeMethod}$ in OptiFusion.
  • ...and 8 more figures