A Promising Future: Omission Failures in Choreographic Programming
Eva Graversen, Fabrizio Montesi, Marco Peressotti
TL;DR
This work extends choreographic programming to realistic networks by introducing Lossy Choreographies (LC) that separate communications into independent send and receive actions linked by frames $$(k,k')^{\mathsf{T}}:\mathsf{p}\to \mathsf{q}$$ and governed by an LTS-based semantics over configurations $\langle C,\Sigma, K\rangle$. It provides a typing system and robustness analysis that guarantee delivery properties such as $\text{at-most-once}$ and $\text{best-effort}$ under omission failures, and defines EndPoint Projection (EPP) to compile LC into lossy processes. The authors implement the approach as a library for Choral and demonstrate expressivity with a two-phase commit example, validating practical applicability in failure-prone distributed protocols. Overall, the paper significantly broadens choreographic programming’s scope to realistic failure modes, enabling reliable coordination and recovery in distributed systems while preserving global correctness via frame-tagged communications.
Abstract
Choreographic programming promises a simple approach to the coding of concurrent and distributed systems: write the collective communication behaviour of a system of processes as a choreography, and then the programs for these processes are automatically compiled by a provably-correct procedure known as endpoint projection. While this promise prompted substantial research, a theory that can deal with realistic communication failures in a distributed network remains elusive. In this work, we provide the first theory of choreographic programming that addresses realistic communication failures taken from the literature of distributed systems: processes can send or receive fewer messages than they should (send and receive omission), and the network can fail at transporting messages (omission failure). Our theory supports the programming of strategies for failure recovery, and a novel static analysis (called robustness) to check for delivery guarantees (at-most-once and exactly-once). Our key technical innovation is a deconstruction of the usual communication primitive in choreographies to allow for independent implementations of the send and receive actions of a communication, while still retaining the static guarantee that these actions will correlate correctly (the essence of choreographic programming). This has two main benefits. First, each side of a communication can adopt its own failure recovery strategy, as in realistic protocols. Second, initiating new communications does not require any (unrealistic) synchronisation over unreliable channels: senders and receivers agree by construction on how each message should be identified. We validate our design via a series of examples -- including two-phase commit, which so far eluded choreographic programming -- and an implementation of our ideas in the choreographic programming language Choral.
