Real-World Reinforcement Learning of Active Perception Behaviors
Edward S. Hu, Jie Wang, Xingfang Yuan, Fiona Luo, Muyao Li, Gaspard Lambrechts, Oleh Rybkin, Dinesh Jayaraman
TL;DR
The paper tackles learning active perception under partial observability by introducing AAWR, an asymmetric RL method that uses training-time privileged information to train privileged value functions guiding the target policy. It provides a theoretical justification showing AAWR aligns with constrained policy improvement in POMDPs and demonstrates strong real-world performance across 8 tasks on multiple robots, outperforming imitation and prior privileged-information baselines. The approach enables efficient online adaptation from suboptimal demonstrations and coarse initial policies, addressing sample efficiency and sim-to-real challenges in active perception. The work also discusses limitations and potential extensions to broader tasks and integration with foundation models. Overall, AAWR offers a practical, theory-backed pathway to robust active perception in real robots with partial observability.
Abstract
A robot's instantaneous sensory observations do not always reveal task-relevant state information. Under such partial observability, optimal behavior typically involves explicitly acting to gain the missing information. Today's standard robot learning techniques struggle to produce such active perception behaviors. We propose a simple real-world robot learning recipe to efficiently train active perception policies. Our approach, asymmetric advantage weighted regression (AAWR), exploits access to "privileged" extra sensors at training time. The privileged sensors enable training high-quality privileged value functions that aid in estimating the advantage of the target policy. Bootstrapping from a small number of potentially suboptimal demonstrations and an easy-to-obtain coarse policy initialization, AAWR quickly acquires active perception behaviors and boosts task performance. In evaluations on 8 manipulation tasks on 3 robots spanning varying degrees of partial observability, AAWR synthesizes reliable active perception behaviors that outperform all prior approaches. When initialized with a "generalist" robot policy that struggles with active perception tasks, AAWR efficiently generates information-gathering behaviors that allow it to operate under severe partial observability for manipulation tasks. Website: https://penn-pal-lab.github.io/aawr/
