TreeIRL: Safe Urban Driving with Tree Search and Inverse Reinforcement Learning
Momchil S. Tomov, Sang Uk Lee, Hansford Hendrago, Jinwook Huh, Teawon Han, Forbes Howington, Rafael da Silva, Gianmarco Bernasconi, Marc Heim, Samuel Findler, Xiaonan Ji, Alexander Boule, Michael Napoli, Kuo Chen, Jesse Miller, Boaz Floor, Yunqing Hu
TL;DR
TreeIRL tackles the planning bottleneck in autonomous driving by integrating Monte Carlo tree search (MCTS) as a trajectory generator with inverse reinforcement learning (IRL) for scoring. This hybrid approach assigns safety and exploration to MCTS while delegating comfort and human-likeness to an IRL-based scorer trained on expert driving, enabling robust performance in dense urban scenarios. The authors demonstrate, across large-scale simulations and real-world Las Vegas driving (over 500 miles), that TreeIRL achieves superior safety, comparable progress, and improved comfort relative to baselines, and provide the first real-world deployment of an MCTS-based planner. The work emphasizes a holistic, multi-metric evaluation and highlights the sim-to-real gap, advocating on-road testing as a critical component of planner assessment and deployment. TreeIRL thus presents a scalable, extensible framework that blends classical planning with learning-based components to address autonomous driving's planning bottleneck in real-world environments.
Abstract
We present TreeIRL, a novel planner for autonomous driving that combines Monte Carlo tree search (MCTS) and inverse reinforcement learning (IRL) to achieve state-of-the-art performance in simulation and in real-world driving. The core idea is to use MCTS to find a promising set of safe candidate trajectories and a deep IRL scoring function to select the most human-like among them. We evaluate TreeIRL against both classical and state-of-the-art planners in large-scale simulations and on 500+ miles of real-world autonomous driving in the Las Vegas metropolitan area. Test scenarios include dense urban traffic, adaptive cruise control, cut-ins, and traffic lights. TreeIRL achieves the best overall performance, striking a balance between safety, progress, comfort, and human-likeness. To our knowledge, our work is the first demonstration of MCTS-based planning on public roads and underscores the importance of evaluating planners across a diverse set of metrics and in real-world environments. TreeIRL is highly extensible and could be further improved with reinforcement learning and imitation learning, providing a framework for exploring different combinations of classical and learning-based approaches to solve the planning bottleneck in autonomous driving.
