Using Process Calculus for Optimizing Data and Computation Sharing in Complex Stateful Parallel Computations
Zilu Tian, Dan Olteanu, Christoph Koch
TL;DR
This paper tackles the challenge of accelerating complex stateful parallel computations, notably agent-based simulations, by introducing behavioral equations built on the $\pi$-calculus within a BSP setting. It presents OptiFusion, a compile-time program specialization framework that transforms a simple, generic agent program into highly optimized, partition-aware implementations through a sequence of rewriting steps and annotations for data and computation placement. The approach yields substantial performance improvements, achieving up to $10\times$ faster execution compared with stateful baselines and up to $2\times$ the speed of hand-optimized codes, while also enabling data sharing and computation sharing that are impractical to hand-write. These results demonstrate a principled, formal foundation for parallel program optimization with strong practical impact for large-scale, complex stateful workloads.
Abstract
We propose novel techniques that exploit data and computation sharing to improve the performance of complex stateful parallel computations, like agent-based simulations. Parallel computations are translated into behavioral equations, a novel formalism layered on top of the foundational process calculus $π$-calculus. Behavioral equations blend code and data, allowing a system to easily compose and transform parallel programs into specialized programs. We show how optimizations like merging programs, synthesizing efficient message data structures, eliminating local messaging, rewriting communication instructions into local computations, and {aggregation pushdown} can be expressed as transformations of behavioral equations. We have also built a system called OptiFusion that implements behavioral equations and the aforementioned optimizations. Our experiments showed that OptiFusion is over 10$\times$ faster than state-of-the-art stateful systems benchmarked via complex stateful workloads. Generating specialized instructions that are impractical to write by hand allows OptiFusion to outperform even the hand-optimized implementations by up to 2$\times$.
