Environment as Policy: Learning to Race in Unseen Tracks
Hongze Wang, Jiaxu Xing, Nico Messikommer, Davide Scaramuzza
TL;DR
The paper tackles the problem of generalizing drone-racing policies to unseen and dynamic tracks. It introduces an adaptive environment-shaping framework where a separate Environment Policy curates training tracks, guided by a ranking-based reward that favors intermediate difficulty, and the Racing Policy learns time-optimal control via standard RL objectives. By alternating updates between the two policies and training in multiple parallel simulations, the approach yields a single racing policy that generalizes to diverse tracks and even moving-gate scenarios, outperforming domain randomization and curriculum baselines. Real-world experiments with precise state estimation demonstrate 100% success on unseen tracks, highlighting significant gains in robustness and transfer for agile robotics in open, dynamic environments.
Abstract
Reinforcement learning (RL) has achieved outstanding success in complex robot control tasks, such as drone racing, where the RL agents have outperformed human champions in a known racing track. However, these agents fail in unseen track configurations, always requiring complete retraining when presented with new track layouts. This work aims to develop RL agents that generalize effectively to novel track configurations without retraining. The naive solution of training directly on a diverse set of track layouts can overburden the agent, resulting in suboptimal policy learning as the increased complexity of the environment impairs the agent's ability to learn to fly. To enhance the generalizability of the RL agent, we propose an adaptive environment-shaping framework that dynamically adjusts the training environment based on the agent's performance. We achieve this by leveraging a secondary RL policy to design environments that strike a balance between being challenging and achievable, allowing the agent to adapt and improve progressively. Using our adaptive environment shaping, one single racing policy efficiently learns to race in diverse challenging tracks. Experimental results validated in both simulation and the real world show that our method enables drones to successfully fly complex and unseen race tracks, outperforming existing environment-shaping techniques. Project page: http://rpg.ifi.uzh.ch/env_as_policy.
