HALO: Fault-Tolerant Safety Architecture For High-Speed Autonomous Racing
Aron Harder, Amar Kulkarni, Madhur Behl
TL;DR
HALO presents a fault-tolerant safety architecture for high-speed autonomous racing stacks, addressing runtime faults across perception, planning, control, and communication. It implements a four-node safety framework—Graceful Stop, Node Health Monitor, Topic Multiplexer, and Behavioral-Safety Monitor—driven by a Failure Mode, Effects, and Criticality Analysis (FMECA) and validated with real Indy Autonomous Challenge data. The results show HALO mitigating data-health, node-health, and behavioral-safety faults, enabling safer operation with controlled performance trade-offs. This work provides a generalizable approach to safety in autonomous cyber-physical systems and informs safety architectures for broader high-speed autonomous applications.
Abstract
The field of high-speed autonomous racing has seen significant advances in recent years, with the rise of competitions such as RoboRace and the Indy Autonomous Challenge providing a platform for researchers to develop software stacks for autonomous race vehicles capable of reaching speeds in excess of 170 mph. Ensuring the safety of these vehicles requires the software to continuously monitor for different faults and erroneous operating conditions during high-speed operation, with the goal of mitigating any unreasonable risks posed by malfunctions in sub-systems and components. This paper presents a comprehensive overview of the HALO safety architecture, which has been implemented on a full-scale autonomous racing vehicle as part of the Indy Autonomous Challenge. The paper begins with a failure mode and criticality analysis of the perception, planning, control, and communication modules of the software stack. Specifically, we examine three different types of faults - node health, data health, and behavioral-safety faults. To mitigate these faults, the paper then outlines HALO safety archetypes and runtime monitoring methods. Finally, the paper demonstrates the effectiveness of the HALO safety architecture for each of the faults, through real-world data gathered from autonomous racing vehicle trials during multi-agent scenarios.
