Self-Consistent Model-based Adaptation for Visual Reinforcement Learning
Xinning Zhou, Chengyang Ying, Yao Feng, Hang Su, Jun Zhu
TL;DR
SCMA tackles the challenge of visual distractions in visual reinforcement learning by introducing a policy-agnostic denoising model that transfers cluttered observations to clean ones. It optimizes the denoiser with an unsupervised distribution-matching objective, leveraging a pre-trained world model to estimate the clean observation distribution $p(o_{1:T}|a_{1:T})$. The method forms a two-stage pipeline: pre-train policies and world models in clean environments, then adapt to distractions by refining the denoiser using losses that enforce self-consistency and information preservation, with optional reward-driven guidance. Empirical results on DMControl, RL-ViGen, and real-world robot data show SCMA significantly narrows performance gaps across various distractions and improves sample efficiency in deployment. This plug-and-play approach offers practical robustness for real-world robotic control without requiring policy finetuning.
Abstract
Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.
