Table of Contents
Fetching ...

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

Minghe Gao, Juncheng Li, Yuze Lin, Xuqi Liu, Jiaming Ji, Xiaoran Pan, Zihan Xu, Xian Li, Mingjie Li, Wei Ji, Rong Wei, Rui Tang, Qizhou Wang, Kai Shen, Jun Xiao, Qi Wu, Siliang Tang, Yueting Zhuang

TL;DR

Arcadia reframes embodied AI as a closed real-to-sim-to-real lifecycle and introduces a four-component framework that tightly couples autonomous real-world data collection, generative scene reconstruction, a shared multimodal representation for navigation and manipulation, and deployment-informed simulation adaptation. By implementing a bidirectional feedback loop that updates assets, dynamics, and supervision, Arcadia achieves measurable gains in both simulated benchmarks and real-world robot tests, demonstrating robust generalization across tasks and domains. Ablation studies confirm that each lifecycle component contributes to performance, and real-world deployment signals are effectively leveraged to refine simulation and policies. The work also provides standardized interfaces to enable reproducible evaluation and cross-model comparisons, positioning Arcadia as a scalable foundation for lifelong, embodied agents.

Abstract

We contend that embodied learning is fundamentally a lifecycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. We introduce Arcadia, a closed-loop framework that operationalizes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. We release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.

Arcadia: Toward a Full-Lifecycle Framework for Embodied Lifelong Learning

TL;DR

Arcadia reframes embodied AI as a closed real-to-sim-to-real lifecycle and introduces a four-component framework that tightly couples autonomous real-world data collection, generative scene reconstruction, a shared multimodal representation for navigation and manipulation, and deployment-informed simulation adaptation. By implementing a bidirectional feedback loop that updates assets, dynamics, and supervision, Arcadia achieves measurable gains in both simulated benchmarks and real-world robot tests, demonstrating robust generalization across tasks and domains. Ablation studies confirm that each lifecycle component contributes to performance, and real-world deployment signals are effectively leveraged to refine simulation and policies. The work also provides standardized interfaces to enable reproducible evaluation and cross-model comparisons, positioning Arcadia as a scalable foundation for lifelong, embodied agents.

Abstract

We contend that embodied learning is fundamentally a lifecycle problem rather than a single-stage optimization. Systems that optimize only one link (data collection, simulation, learning, or deployment) rarely sustain improvement or generalize beyond narrow settings. We introduce Arcadia, a closed-loop framework that operationalizes embodied lifelong learning by tightly coupling four stages: (1) Self-evolving exploration and grounding for autonomous data acquisition in physical environments, (2) Generative scene reconstruction and augmentation for realistic and extensible scene creation, (3) a Shared embodied representation architecture that unifies navigation and manipulation within a single multimodal backbone, and (4) Sim-from-real evaluation and evolution that closes the feedback loop through simulation-based adaptation. This coupling is non-decomposable: removing any stage breaks the improvement loop and reverts to one-shot training. Arcadia delivers consistent gains on navigation and manipulation benchmarks and transfers robustly to physical robots, indicating that a tightly coupled lifecycle: continuous real-world data acquisition, generative simulation update, and shared-representation learning, supports lifelong improvement and end-to-end generalization. We release standardized interfaces enabling reproducible evaluation and cross-model comparison in reusable environments, positioning Arcadia as a scalable foundation for general-purpose embodied agents.

Paper Structure

This paper contains 16 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Arcadia provides an overview of a full real-to-sim-to-real lifecycle for embodied learning, illustrating how the framework closes the loop between real-world experience, simulation, and redeployment while addressing four core limitations in contemporary embodied AI: exogenous data dependence, static pre-rendered environments, fragmented model architectures, and sparse real-world feedback.
  • Figure 2: Overview of Arcadia’s real-to-sim pipeline: robots autonomously explore real environments to collect multimodal data (Step 1), which are then reconstructed and augmented into editable 3D scenes for simulation (Step 2).
  • Figure 2: Ablation study on each component.
  • Figure 3: Overview of Arcadia’s sim-to-real pipeline: navigation and manipulation trajectories are collected in simulation (Step 3) using A* and RRT planning within a shared embodied architecture, then real-world feedback is integrated for continual refinement (Step 4).
  • Figure 4: Navigation route from the living and dining area to the kitchen by Unitree G1 robot. (a) Floor plan showing the designated path. (b) 3D rendering of the environment with the same route. (c) First-person visual sequence along the path. (d) Third-person view.
  • ...and 2 more figures