Stacked Universal Successor Feature Approximators for Safety in Reinforcement Learning
Ian Cannon, Washington Garcia, Thomas Gresavage, Joseph Saurine, Ian Leong, Jared Culbertson
TL;DR
This work introduces SUSFAS, a stacked universal successor feature approximator designed for safety-aware reinforcement learning in continuous control. By learning independent successor features for each objective component and integrating with SAC and a runtime assurance (RTA) controller, SUSFAS improves secondary objectives such as fuel efficiency while maintaining primary task performance. Key contributions include expert stacking (learning SFs independently), extensive ablations against collapsed USFA, and demonstrations that RTA presence is crucial for realizing SUSFAS gains, with substantial fuel-efficiency improvements in mission-critical tasks. The results highlight the potential of stacking successor features to encode safety-controller behaviors and enable robust, multi-objective policies in safety-critical domains.
Abstract
Real-world problems often involve complex objective structures that resist distillation into reinforcement learning environments with a single objective. Operation costs must be balanced with multi-dimensional task performance and end-states' effects on future availability, all while ensuring safety for other agents in the environment and the reinforcement learning agent itself. System redundancy through secondary backup controllers has proven to be an effective method to ensure safety in real-world applications where the risk of violating constraints is extremely high. In this work, we investigate the utility of a stacked, continuous-control variation of universal successor feature approximation (USFA) adapted for soft actor-critic (SAC) and coupled with a suite of secondary safety controllers, which we call stacked USFA for safety (SUSFAS). Our method improves performance on secondary objectives compared to SAC baselines using an intervening secondary controller such as a runtime assurance (RTA) controller.
