Table of Contents
Fetching ...

A call for embodied AI

Giuseppe Paolo, Jonas Gonzalez-Billandon, Balázs Kégl

TL;DR

The paper argues that achieving artificial general intelligence requires embodied AI that learns through real-world interaction rather than relying solely on static foundation models like LLMs. It proposes a cognitive-architecture–based theoretical framework anchored in perception, action, memory, and learning, integrated via Friston's active inference. Key contributions include a detailed articulation of a four-component framework, the role of simulators, and the identification of challenges—new learning theory, noise, generalization, human interaction, and hardware constraints—as a roadmap for future work. By grounding AI in embodiment, the authors contend that agents can acquire grounded knowledge, learn affordances, and adapt continuously in dynamic environments, enabling safer human–AI collaboration and scalable progress toward AGI. They also contrast SMAIs with LLMs, arguing that embodied, interaction-driven systems offer more natural alignment opportunities and practical benefits, albeit with notable alignment and safety considerations that demand principled evaluation.

Abstract

We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.

A call for embodied AI

TL;DR

The paper argues that achieving artificial general intelligence requires embodied AI that learns through real-world interaction rather than relying solely on static foundation models like LLMs. It proposes a cognitive-architecture–based theoretical framework anchored in perception, action, memory, and learning, integrated via Friston's active inference. Key contributions include a detailed articulation of a four-component framework, the role of simulators, and the identification of challenges—new learning theory, noise, generalization, human interaction, and hardware constraints—as a roadmap for future work. By grounding AI in embodiment, the authors contend that agents can acquire grounded knowledge, learn affordances, and adapt continuously in dynamic environments, enabling safer human–AI collaboration and scalable progress toward AGI. They also contrast SMAIs with LLMs, arguing that embodied, interaction-driven systems offer more natural alignment opportunities and practical benefits, albeit with notable alignment and safety considerations that demand principled evaluation.

Abstract

We propose Embodied AI as the next fundamental step in the pursuit of Artificial General Intelligence, juxtaposing it against current AI advancements, particularly Large Language Models. We traverse the evolution of the embodiment concept across diverse fields - philosophy, psychology, neuroscience, and robotics - to highlight how EAI distinguishes itself from the classical paradigm of static learning. By broadening the scope of Embodied AI, we introduce a theoretical framework based on cognitive architectures, emphasizing perception, action, memory, and learning as essential components of an embodied agent. This framework is aligned with Friston's active inference principle, offering a comprehensive approach to EAI development. Despite the progress made in the field of AI, substantial challenges, such as the formulation of a novel AI learning theory and the innovation of advanced hardware, persist. Our discussion lays down a foundational guideline for future Embodied AI research. Highlighting the importance of creating Embodied AI agents capable of seamless communication, collaboration, and coexistence with humans and other intelligent entities within real-world environments, we aim to steer the AI community towards addressing the multifaceted challenges and seizing the opportunities that lie ahead in the quest for AGI.
Paper Structure (16 sections)