Araucaria: Simplifying INC Fault Tolerance with High-Level Intents
Ricardo Parizotto, Israat Haque, Alberto Schaeffer-Filho
TL;DR
Araucaria addresses the challenge of implementing fault tolerance for in-network computing by introducing an intent-based refinement pipeline that translates high-level fault-tolerance goals into instrumented data-plane code and deployment configurations. The core idea is to capture robustness requirements in a constrained natural language, compile them into an intermediary representation of building blocks, instrument the INC source, and generate concrete network configurations and control-plane coordination. The approach defines four reusable fault-tolerance blocks (Failure Detector, Replication, State Collection, Recovery) and supports both strong and strong eventual consistency through configurable recovery strategies, including CRDT-based merges. Empirical evaluation demonstrates that Araucaria can rapidly refine intents, support scalable recovery across multiple replicas, and achieve low-overhead fault tolerance, with hardware experiments showing recovery times around 0.16 seconds under optimized scenarios. Overall, Araucaria offers a DevOps-friendly, scalable mechanism to express and enforce INC fault tolerance via high-level intents, with practical deployment viability on both emulation and real switching hardware.
Abstract
Network programmability allows modification of fine-grain data plane functionality. The performance benefits of data plane programmability have motivated many researchers to offload computation that previously operated only on servers to the network, creating the notion of in-network computing (INC). Because failures can occur in the data plane, fault tolerance mechanisms are essential for INC. However, INC operators and developers must manually set fault tolerance requirements using domain knowledge to change the source code. These manually set requirements may take time and lead to errors in case of misconfiguration. In this work, we present Araucaria, a system that aims to simplify the definition and implementation of fault tolerance requirements for INC. The system allows requirements specification using an intent language, which enables the expression of consistency and availability requirements in a constrained natural language. A refinement process translates the intent and incorporates the essential building blocks and configurations into the INC code. We present a prototype of Araucaria and analyze the end-to-end system behavior. Experiments demonstrate that the refinement scales to multiple intents and that the system provides fault tolerance with negligible overhead in failure scenarios.
