Intelligent Switching for Reset-Free RL
Darshan Patil, Janarthanan Rajendran, Glen Berseth, Sarath Chandar
TL;DR
This work tackles the lack of environment resets in real-world RL by proposing RISC, an intelligent switching framework between forward and reset controllers guided by a competency-based success critic. It emphasizes timeout-nonterminal bootstrapping to keep learning targets stable and introduces modulated switching to balance exploration and exploitation. Empirical results on the EARL benchmark and a four-room gridworld show RISC achieving state-of-the-art performance among reset-free methods and robust improvements over ablated variants. The approach offers practical advantages for real-world deployment by improving sample efficiency and reducing redundant exploration in well-learned regions.
Abstract
In the real world, the strong episode resetting mechanisms that are needed to train agents in simulation are unavailable. The \textit{resetting} assumption limits the potential of reinforcement learning in the real world, as providing resets to an agent usually requires the creation of additional handcrafted mechanisms or human interventions. Recent work aims to train agents (\textit{forward}) with learned resets by constructing a second (\textit{backward}) agent that returns the forward agent to the initial state. We find that the termination and timing of the transitions between these two agents are crucial for algorithm success. With this in mind, we create a new algorithm, Reset Free RL with Intelligently Switching Controller (RISC) which intelligently switches between the two agents based on the agent's confidence in achieving its current goal. Our new method achieves state-of-the-art performance on several challenging environments for reset-free RL.
