Table of Contents
Fetching ...

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

Shaokai Wu, Yanbiao Ji, Qiuchang Li, Zhiyi Zhang, Qichen He, Wenyuan Xie, Guodong Zhang, Bayram Bayramli, Yue Ding, Hongtao Lu

TL;DR

The paper tackles the challenge of post-deployment learning for frozen Vision-Language-Action policies by introducing the Experience Feedback Network (EFN), a retrieval-conditioned residual that augments a frozen backbone with an online, growing memory of past experiences. EFN uses a dense similarity-based reward and Soft Actor-Critic to learn small action corrections on top of the base policy, while an experience bank stores task-conditioned trajectories to guide decisions without changing the backbone. Across LIBERO, CALVIN, and real-world AgiBot-G1 tasks, EFN yields consistent improvements over baselines, including retrieval-only and test-time training methods, while incurring modest computational overhead. These results demonstrate a practical, scalable path for deployment-time adaptation through memory growth rather than continual backbone finetuning, with broad implications for robust embodied intelligence.

Abstract

Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire additional knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN identifies contextually prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards to train EFN, ensuring that the predicted actions align with past behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience". Experiments across diverse embodied tasks show that EFN improves adaptability, robustness, and success rates over frozen baselines. We provide code and demo in our supplementary material.

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

TL;DR

The paper tackles the challenge of post-deployment learning for frozen Vision-Language-Action policies by introducing the Experience Feedback Network (EFN), a retrieval-conditioned residual that augments a frozen backbone with an online, growing memory of past experiences. EFN uses a dense similarity-based reward and Soft Actor-Critic to learn small action corrections on top of the base policy, while an experience bank stores task-conditioned trajectories to guide decisions without changing the backbone. Across LIBERO, CALVIN, and real-world AgiBot-G1 tasks, EFN yields consistent improvements over baselines, including retrieval-only and test-time training methods, while incurring modest computational overhead. These results demonstrate a practical, scalable path for deployment-time adaptation through memory growth rather than continual backbone finetuning, with broad implications for robust embodied intelligence.

Abstract

Embodied agents face a fundamental limitation: once deployed in real-world environments to perform specific tasks, they are unable to acquire additional knowledge to enhance task performance. In this paper, we propose a general post-deployment learning framework Dejavu, which employs an Experience Feedback Network (EFN) and augments the frozen Vision-Language-Action (VLA) policy with retrieved execution memories. EFN identifies contextually prior action experiences and conditions action prediction on this retrieved guidance. We adopt reinforcement learning with semantic similarity rewards to train EFN, ensuring that the predicted actions align with past behaviors under current observations. During deployment, EFN continually enriches its memory with new trajectories, enabling the agent to exhibit "learning from experience". Experiments across diverse embodied tasks show that EFN improves adaptability, robustness, and success rates over frozen baselines. We provide code and demo in our supplementary material.

Paper Structure

This paper contains 65 sections, 47 equations, 14 figures, 9 tables, 1 algorithm.

Figures (14)

  • Figure 1: Top: a policy is trained once and then deployed with frozen weights, which prevents adaptation at test time. Bottom: a frozen VLA policy is augmented by an Experience Feedback Network that retrieves semantically relevant prior trajectories, produces residual corrections, and closes the loop with outcome similarity signals while keeping the base policy unchanged.
  • Figure 2: EFN trains a residual policy with SAC to nudge the base action so the next frame matches the retrieved memory's successor.
  • Figure 3: EFN infers by retrieving efficient candidates, applies the residual correction and grows the experience bank online.
  • Figure 4: Visualization of EFN's language-conditioned retrieval. (a) PCA projection of instruction embeddings $\ell_\tau$ grouped by task type: similar instructions form clusters, so EFN can restrict retrieval to a small set of relevant rollouts. (b) PCA projection of experience keys $\mathbf{k}_i$ and online queries $\mathbf{q}_t$: for a given query, the retrieved top-$k$ neighbors (blue) lie close in this space.
  • Figure 5: Reward decomposition along a representative rollout under EFN's shaped objective. Early in the trajectory the agent makes progress and receives high reward; when it idles near a good view, the lazy penalty suppresses $r_t$; once it moves toward the retrieved successor frame again, the progress and motion terms dominate.
  • ...and 9 more figures