Never-Ending Behavior-Cloning Agent for Robotic Manipulation

Wenqi Liang; Gan Sun; Yao He; Yu Ren; Jiahua Dong; Yang Cong

Never-Ending Behavior-Cloning Agent for Robotic Manipulation

Wenqi Liang, Gan Sun, Yao He, Yu Ren, Jiahua Dong, Yang Cong

TL;DR

Robotic manipulation in open-world settings demands continual learning with robust 3D scene understanding. The authors propose NBAgent, a Never-ending Behavior-cloning Agent, with three key components: SSR for transferring 3D scene semantics via NeRF-guided supervision, SRD for cross-task representation distillation, and SEP for skill-specific knowledge via a dynamic latent space and LoRA adapters. The approach is evaluated on RLBench and a never-ending benchmark with Kitchen and Living Room tasks, showing state-of-the-art performance and improved resistance to forgetting compared to strong baselines. This work advances open-ended, language-conditioned manipulation by separating skill-shared and skill-specific knowledge and demonstrates practical potential for continual robotic learning in 3D-rich environments.

Abstract

Relying on multi-modal observations, embodied robots (e.g., humanoid robots) could perform multiple robotic manipulation tasks in unstructured real-world environments. However, most language-conditioned behavior-cloning agents in robots still face existing long-standing challenges, i.e., 3D scene representation and human-level task learning, when adapting into a series of new tasks in practical scenarios. We here investigate these above challenges with NBAgent in embodied robots, a pioneering language-conditioned Never-ending Behavior-cloning Agent, which can continually learn observation knowledge of novel 3D scene semantics and robot manipulation skills from skill-shared and skill-specific attributes, respectively. Specifically, we propose a skill-shared semantic rendering module and a skill-shared representation distillation module to effectively learn 3D scene semantics from skill-shared attribute, further tackling 3D scene representation overlooking. Meanwhile, we establish a skill-specific evolving planner to perform manipulation knowledge decoupling, which can continually embed novel skill-specific knowledge like human from latent and low-rank space. Finally, we design a never-ending embodied robot manipulation benchmark, and expensive experiments demonstrate the significant performance of our method.

Never-Ending Behavior-Cloning Agent for Robotic Manipulation

TL;DR

Abstract

Paper Structure (16 sections, 13 equations, 9 figures, 13 tables, 2 algorithms)

This paper contains 16 sections, 13 equations, 9 figures, 13 tables, 2 algorithms.

Introduction
Related Work
Methodology
Problem Definition and Overview
Skill-Shared Semantic Rendering Module
Skill-Shared Representation Distillation Module
Skill-Shared Representation Distillation Module
Skill-Specific Evolving Planner
Experiments
Implementation Details
Comparison Performance on RLbench
Skill-Wise Comparison Performance
Ablation Studies
Analysis of Task Complexity
Analysis of Computation Complexity
...and 1 more sections

Figures (9)

Figure 1: Demonstration illustration of our proposed never-ending behavior-cloning robot learning. As illustrated in (a), behavior-cloning robot learning primarily focuses on initially training on a fixed dataset, subsequently relying on the generalization capability to execute tasks in unseen environments, where a pre-trained CLIP model radford2021learning serves as the the language encoder to process the language instruction. As depicted in (b), the never-ending behavior-cloning framework enables robotic systems to progressively acquire novel manipulation skills in a continual learning manner, thereby demonstrating enhanced adaptability and generalization capabilities when confronted with unseen and challenging tasks.
Figure 2: Overview of the proposed NBAgent. It consists of a skill-shared semantic rendering module and a skill-shared representation distillation loss$\mathcal{L}_{\mathrm{SRD}}$ to transfer skill-shared knowledge on semantics of 3D scenes and overcome 3D reasoning overlooking in continual learning, and a skill-specific evolving planner to learn skill-specific knowledge, addressing catastrophic forgetting on learned skills.
Figure 3: Prediction examples on RLBench james2020rlbench. For qualitative evaluation, we visualize three key frames from each manipulation task across all methods.
Figure 4: Visualization of manipulation skills in dataset Kitchen, consisting of 10 manipulation skills pertinent to kitchen environments.
Figure 5: Visualization of manipulation skills in dataset Living Room, including 12 manipulation skills associated with living room scenarios.
...and 4 more figures

Never-Ending Behavior-Cloning Agent for Robotic Manipulation

TL;DR

Abstract

Never-Ending Behavior-Cloning Agent for Robotic Manipulation

Authors

TL;DR

Abstract

Table of Contents

Figures (9)