Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

Minghe Gao; Juncheng Li; Yuze Lin; Xuqi Liu; Jiaming Ji; Xiaoran Pan; Zihan Xu; Xian Li; Mingjie Li; Wei Ji; Rong Wei; Rui Tang; Qizhou Wang; Kai Shen; Jun Xiao; Qi Wu; Siliang Tang; Yueting Zhuang

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

Minghe Gao, Juncheng Li, Yuze Lin, Xuqi Liu, Jiaming Ji, Xiaoran Pan, Zihan Xu, Xian Li, Mingjie Li, Wei Ji, Rong Wei, Rui Tang, Qizhou Wang, Kai Shen, Jun Xiao, Qi Wu, Siliang Tang, Yueting Zhuang

TL;DR

Arcadia reframes embodied AI as a closed real-to-sim-to-real lifecycle and introduces a four-component framework that tightly couples autonomous real-world data collection, generative scene reconstruction, a shared multimodal representation for navigation and manipulation, and deployment-informed simulation adaptation. By implementing a bidirectional feedback loop that updates assets, dynamics, and supervision, Arcadia achieves measurable gains in both simulated benchmarks and real-world robot tests, demonstrating robust generalization across tasks and domains. Ablation studies confirm that each lifecycle component contributes to performance, and real-world deployment signals are effectively leveraged to refine simulation and policies. The work also provides standardized interfaces to enable reproducible evaluation and cross-model comparisons, positioning Arcadia as a scalable foundation for lifelong, embodied agents.

Abstract

We contend that embodied learning is fundamentally a lifecycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. We introduce Arcadia, a closed-loop framework that operationalizes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. We release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

TL;DR

Abstract

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)