Crash-Stop Failures in Asynchronous Multiparty Session Types
Adam D. Barwell, Ping Hou, Nobuko Yoshida, Fangyi Zhou
TL;DR
This work extends asynchronous multiparty session types to model crash-stop failures in distributed systems by introducing crash branches in global and local types and a dedicated stop type for crashed endpoints. The framework supports optional reliability via a configurable set of reliable roles and preserves correctness-by-construction guarantees—safety, deadlock-freedom, and liveness—through a top-down design that links global specifications to local implementations. It provides a formal asynchronous calculus with crash-aware operational semantics and a corresponding type system, along with a comprehensive relationship between global-type reductions and configurations via projection and association theorems. A case study on Non-Blocking Atomic Commits demonstrates practical applicability to distributed transactions, while the discussion of alternatives and related work clarifies the design choices and scope. The results lay groundwork for further work on crash-recover models and broader failure modes in MPST-enabled distributed programming.
Abstract
Session types provide a typing discipline for message-passing systems. However, their theory often assumes an ideal world: one in which everything is reliable and without failures. Yet this is in stark contrast with distributed systems in the real world. To address this limitation, we introduce a new asynchronous multiparty session types (MPST) theory with crash-stop failures, where processes may crash arbitrarily and cease to interact after crashing. We augment asynchronous MPST and processes with crash handling branches, and integrate crash-stop failure semantics into types and processes. Our approach requires no user-level syntax extensions for global types, and features a formalisation of global semantics, which captures complex behaviours induced by crashed/crash handling processes. Our new theory covers the entire spectrum, ranging from the ideal world of total reliability to entirely unreliable scenarios where any process may crash, using optional reliability assumptions. Under these assumptions, we demonstrate the sound and complete correspondence between global and local type semantics, which guarantee deadlock-freedom, protocol conformance, and liveness of well-typed processes by construction, even in the presence of crashes.
