Automatic Environment Shaping is the Next Frontier in RL
Younghyo Park, Gabriel B. Margolis, Pulkit Agrawal
TL;DR
The paper addresses the bottleneck of manual environment shaping in robotics RL and argues for automatic environment shaping as a path to generalization. It formalizes environment shaping as a bilevel optimization where an outer shaping function f transforms a reference environment into a learnable E^shaped and an inner RL optimization finds a policy π that maximizes rewards on E^shaped, with the outer objective evaluated on a test environment E^test: max_{f∈F} J(π^*, E^test) subject to π^* ∈ argmax_π J(π; E^shaped), where E^shaped = f(E^ref). The paper details a four-subtask workflow (modeling sample environments, shaping, RL training, evaluation/reflection), analyzes current state showing rewards-focused automation is insufficient, and proposes paths forward including scalable outer-loop search, better priors, online shaping, and unshaped robotics benchmarks. It advocates concrete tooling for shaping experiments and benchmarks to measure the total cost of applying RL to real-world tasks, aiming to reduce human effort and improve robustness across tasks and domains.
Abstract
Many roboticists dream of presenting a robot with a task in the evening and returning the next morning to find the robot capable of solving the task. What is preventing us from achieving this? Sim-to-real reinforcement learning (RL) has achieved impressive performance on challenging robotics tasks, but requires substantial human effort to set up the task in a way that is amenable to RL. It's our position that algorithmic improvements in policy optimization and other ideas should be guided towards resolving the primary bottleneck of shaping the training environment, i.e., designing observations, actions, rewards and simulation dynamics. Most practitioners don't tune the RL algorithm, but other environment parameters to obtain a desirable controller. We posit that scaling RL to diverse robotic tasks will only be achieved if the community focuses on automating environment shaping procedures.
