The Vector Grounding Problem
Dimitri Coelho Mollo, Raphaël Millière
TL;DR
The paper reframes grounding from symbolic to vector-based representations in LLMs, introducing the Vector Grounding Problem as a challenge to intrinsic meaning independent of external interpretation. It argues that two ingredients—causal-informational relations to the world and a history of selection that endows internal states with the function to carry world information—are sufficient for referential grounding in LLMs. Grounding can be instantiated via three routes: post-training preference tuning, pre-training under certain conditions, and transient in-context learning (mesa-optimisation). The authors discuss implications for identity, multimodality, and embodiment, arguing that intrinsic meaning in outputs is possible in principle even if grounding does not entail full cognition or consciousness.
Abstract
Large language models (LLMs) produce seemingly meaningful outputs, yet they are trained on text alone without direct interaction with the world. This leads to a modern variant of the classical symbol grounding problem in AI: can LLMs' internal states and outputs be about extra-linguistic reality, independently of the meaning human interpreters project onto them? We argue that they can. We first distinguish referential grounding -- the connection between a representation and its worldly referent -- from other forms of grounding and argue it is the only kind essential to solving the problem. We contend that referential grounding is achieved when a system's internal states satisfy two conditions derived from teleosemantic theories of representation: (1) they stand in appropriate causal-informational relations to the world, and (2) they have a history of selection that has endowed them with the function of carrying this information. We argue that LLMs can meet both conditions, even without multimodality or embodiment.
