Table of Contents
Fetching ...

Self-Consistent Model-based Adaptation for Visual Reinforcement Learning

Xinning Zhou, Chengyang Ying, Yao Feng, Hang Su, Jun Zhu

TL;DR

SCMA tackles the challenge of visual distractions in visual reinforcement learning by introducing a policy-agnostic denoising model that transfers cluttered observations to clean ones. It optimizes the denoiser with an unsupervised distribution-matching objective, leveraging a pre-trained world model to estimate the clean observation distribution $p(o_{1:T}|a_{1:T})$. The method forms a two-stage pipeline: pre-train policies and world models in clean environments, then adapt to distractions by refining the denoiser using losses that enforce self-consistency and information preservation, with optional reward-driven guidance. Empirical results on DMControl, RL-ViGen, and real-world robot data show SCMA significantly narrows performance gaps across various distractions and improves sample efficiency in deployment. This plug-and-play approach offers practical robustness for real-world robotic control without requiring policy finetuning.

Abstract

Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.

Self-Consistent Model-based Adaptation for Visual Reinforcement Learning

TL;DR

SCMA tackles the challenge of visual distractions in visual reinforcement learning by introducing a policy-agnostic denoising model that transfers cluttered observations to clean ones. It optimizes the denoiser with an unsupervised distribution-matching objective, leveraging a pre-trained world model to estimate the clean observation distribution . The method forms a two-stage pipeline: pre-train policies and world models in clean environments, then adapt to distractions by refining the denoiser using losses that enforce self-consistency and information preservation, with optional reward-driven guidance. Empirical results on DMControl, RL-ViGen, and real-world robot data show SCMA significantly narrows performance gaps across various distractions and improves sample efficiency in deployment. This plug-and-play approach offers practical robustness for real-world robotic control without requiring policy finetuning.

Abstract

Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.

Paper Structure

This paper contains 58 sections, 2 theorems, 34 equations, 17 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Given $p(o_{1:T}|a_{1:T})$ and $p(o^n_{1:T}|a_{1:T})$, let $\mathcal{Q}$ denote the solution set of $\mathcal{L}_{\mathrm{KL}}$: It follows that $\mathcal{Q}$ equals the set of posterior denoising distributions of noise functions in $\mathcal{H}^p_{f_n}$:

Figures (17)

  • Figure 1: The graphical model of a NPOMDP, where $o_t$ and $o^n_t$ denote the clean and cluttered observation respectively.
  • Figure 2: An overview of Self-Consistent Model-based Adaption (SCMA). SCMA adapts the agent to distracting environments by transferring cluttered observations to clean ones with the denoising model $m_{\mathrm{de}}$. Leveraging a pre-trained world model, $m_{\mathrm{de}}$ can be efficiently optimized with self-consistent reconstruction, noisy reconstruction, and reward prediction loss.
  • Figure 3: Visualization of the raw observations and the denoising model's outputs in various distracting environments.
  • Figure 4: Performance curves of different algorithms in the $\mathrm{video\_hard}$ environment, where SCMA exhibits better final performance and sample efficiency.
  • Figure 5: Visualization of the raw observations and denoising model's outputs on real-world robot data.
  • ...and 12 more figures

Theorems & Definitions (5)

  • Definition 1
  • Theorem 1: Proof in Appendix \ref{['appendix_subsec_optimality']}
  • proof
  • Lemma 1
  • proof