Dejavu: Towards Experience Feedback Learning for Embodied Intelligence
Shaokai Wu, Yanbiao Ji, Qiuchang Li, Zhiyi Zhang, Qichen He, Wenyuan Xie, Guodong Zhang, Bayram Bayramli, Yue Ding, Hongtao Lu
TL;DR
The paper tackles the challenge of post-deployment learning for frozen Vision-Language-Action policies by introducing the Experience Feedback Network (EFN), a retrieval-conditioned residual that augments a frozen backbone with an online, growing memory of past experiences. EFN uses a dense similarity-based reward and Soft Actor-Critic to learn small action corrections on top of the base policy, while an experience bank stores task-conditioned trajectories to guide decisions without changing the backbone. Across LIBERO, CALVIN, and real-world AgiBot-G1 tasks, EFN yields consistent improvements over baselines, including retrieval-only and test-time training methods, while incurring modest computational overhead. These results demonstrate a practical, scalable path for deployment-time adaptation through memory growth rather than continual backbone finetuning, with broad implications for robust embodied intelligence.
Abstract
Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire additional knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN identifies contextually prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards to train EFN, ensuring that the predicted actions align with past behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience". Experiments across diverse embodied tasks show that EFN improves adaptability, robustness, and success rates over frozen baselines. We provide code and demo in our supplementary material.
