Table of Contents
Fetching ...

An Asynchronous Scheme for Rollback Recovery in Message-Passing Concurrent Programming Languages

Germán Vidal

TL;DR

The paper addresses rollback recovery in asynchronous, message-passing concurrent languages without shared memory and without central coordination. It extends the language with check, commit, and rollback operators and develops an asynchronous checkpoint-based rollback mechanism that propagates through causal dependencies using forced checkpoints and system notifications. The authors formalize the semantics with extended histories, tagged messages, and a rollback relation, and prove soundness by showing conservativity over standard semantics and correspondence with reversible semantics. They argue this approach enables practical, source-to-source instrumentation for robust fault-tolerance in Erlang-like systems and outline future work toward a proof-of-concept implementation.

Abstract

Rollback recovery strategies are well-known in concurrent and distributed systems. In this context, recovering from unexpected failures is even more relevant given the non-deterministic nature of execution, which means that it is practically impossible to foresee all possible process interactions. In this work, we consider a message-passing concurrent programming language where processes interact through message sending and receiving, but shared memory is not allowed. In this context, we design a checkpoint-based rollback recovery strategy that does not need a central coordination. For this purpose, we extend the language with three new operators: check, commit, and rollback. Furthermore, our approach is purely asynchronous, which is an essential ingredient to developing a source-to-source program instrumentation implementing a rollback recovery strategy.

An Asynchronous Scheme for Rollback Recovery in Message-Passing Concurrent Programming Languages

TL;DR

The paper addresses rollback recovery in asynchronous, message-passing concurrent languages without shared memory and without central coordination. It extends the language with check, commit, and rollback operators and develops an asynchronous checkpoint-based rollback mechanism that propagates through causal dependencies using forced checkpoints and system notifications. The authors formalize the semantics with extended histories, tagged messages, and a rollback relation, and prove soundness by showing conservativity over standard semantics and correspondence with reversible semantics. They argue this approach enables practical, source-to-source instrumentation for robust fault-tolerance in Erlang-like systems and outline future work toward a proof-of-concept implementation.

Abstract

Rollback recovery strategies are well-known in concurrent and distributed systems. In this context, recovering from unexpected failures is even more relevant given the non-deterministic nature of execution, which means that it is practically impossible to foresee all possible process interactions. In this work, we consider a message-passing concurrent programming language where processes interact through message sending and receiving, but shared memory is not allowed. In this context, we design a checkpoint-based rollback recovery strategy that does not need a central coordination. For this purpose, we extend the language with three new operators: check, commit, and rollback. Furthermore, our approach is purely asynchronous, which is an essential ingredient to developing a source-to-source program instrumentation implementing a rollback recovery strategy.
Paper Structure (2 sections, 1 figure)

This paper contains 2 sections, 1 figure.

Figures (1)

  • Figure 1: Example program (bank account server)