Toward Embodied AGI: A Review of Embodied AI and the Road Ahead
Yequan Wang, Aixin Sun
TL;DR
The paper tackles the gap between current embodied AI and true Embodied AGI by proposing a five-level taxonomy (L1–L5) and four capability axes to benchmark progress. It analyzes current end-to-end and plan-and-act paradigms, identifies bottlenecks hindering advances to L3+, and presents a conceptual L3+ framework comprising an omnimodal model architecture and lifelong, physically grounded training. The work contributions include a structured roadmap, critical capability analysis, and a concrete framework intended to inspire future research toward open-ended, real-time, humanoid-capable embodied agents. Its significance lies in providing a principled direction for integrating multimodal perception, human-like cognition, and robust generalization with hardware evolution and safety considerations to advance toward practical Embodied AGI.
Abstract
Artificial General Intelligence (AGI) is often envisioned as inherently embodied. With recent advances in robotics and foundational AI models, we stand at the threshold of a new era-one marked by increasingly generalized embodied AI systems. This paper contributes to the discourse by introducing a systematic taxonomy of Embodied AGI spanning five levels (L1-L5). We review existing research and challenges at the foundational stages (L1-L2) and outline the key components required to achieve higher-level capabilities (L3-L5). Building on these insights and existing technologies, we propose a conceptual framework for an L3+ robotic brain, offering both a technical outlook and a foundation for future exploration.
