Large language models for artificial general intelligence (AGI): A survey of foundational principles and approaches
Alhassan Mumuni, Fuseini Mumuni
TL;DR
The paper surveys foundational principles—embodiment, symbol grounding, causality, and memory—as essential components for achieving artificial general intelligence (AGI) with large language models and multimodal foundation models. It analyzes current state-of-the-art approaches, including embodied agents, knowledge graphs, ontology-driven prompting, RAG, physics-informed world models, and neuro-symbolic grounding, highlighting their roles and limitations. A holistic AGI framework is proposed that interconnects embodiment, grounding, causality, and memory, illustrating how their integration can enable robust, generalizable intelligent agents. The discussion emphasizes the need for unified design, scalable data, and interactive environments to advance toward human-level general intelligence.
Abstract
Generative artificial intelligence (AI) systems based on large-scale pretrained foundation models (PFMs) such as vision-language models, large language models (LLMs), diffusion models and vision-language-action (VLA) models have demonstrated the ability to solve complex and truly non-trivial AI problems in a wide variety of domains and contexts. Multimodal large language models (MLLMs), in particular, learn from vast and diverse data sources, allowing rich and nuanced representations of the world and, thereby, providing extensive capabilities, including the ability to reason, engage in meaningful dialog; collaborate with humans and other agents to jointly solve complex problems; and understand social and emotional aspects of humans. Despite this impressive feat, the cognitive abilities of state-of-the-art LLMs trained on large-scale datasets are still superficial and brittle. Consequently, generic LLMs are severely limited in their generalist capabilities. A number of foundational problems -- embodiment, symbol grounding, causality and memory -- are required to be addressed for LLMs to attain human-level general intelligence. These concepts are more aligned with human cognition and provide LLMs with inherent human-like cognitive properties that support the realization of physically-plausible, semantically meaningful, flexible and more generalizable knowledge and intelligence. In this work, we discuss the aforementioned foundational issues and survey state-of-the art approaches for implementing these concepts in LLMs. Specifically, we discuss how the principles of embodiment, symbol grounding, causality and memory can be leveraged toward the attainment of artificial general intelligence (AGI) in an organic manner.
