AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual Distractors
Yucen Wang, Shenghua Wan, Le Gan, Shuai Feng, De-Chuan Zhan
TL;DR
This work targets visual RL under distractors, focusing on homogeneous distractors that visually resemble the controllable agent. It introduces the Implicit-Action Block MDP (IABMDP) and the Implicit Action Generator (IAG) to infer implicit distractor actions, enabling AD3 to train separated world models conditioned on agent actions and implicit distractor actions. AD3 demonstrates superior performance across DeepMind Control Suite tasks with both heterogeneous and homogeneous distractors, and extensive ablations reveal the implicit actions’ critical role and interpretable semantics. The approach is plug-and-play and capable of integrating with other model-based RL backbones, with practical implications for robust visual control in distraction-rich settings.
Abstract
Model-based methods have significantly contributed to distinguishing task-irrelevant distractors for visual control. However, prior research has primarily focused on heterogeneous distractors like noisy background videos, leaving homogeneous distractors that closely resemble controllable agents largely unexplored, which poses significant challenges to existing methods. To tackle this problem, we propose Implicit Action Generator (IAG) to learn the implicit actions of visual distractors, and present a new algorithm named implicit Action-informed Diverse visual Distractors Distinguisher (AD3), that leverages the action inferred by IAG to train separated world models. Implicit actions effectively capture the behavior of background distractors, aiding in distinguishing the task-irrelevant components, and the agent can optimize the policy within the task-relevant state space. Our method achieves superior performance on various visual control tasks featuring both heterogeneous and homogeneous distractors. The indispensable role of implicit actions learned by IAG is also empirically validated.
