The Vector Grounding Problem

Dimitri Coelho Mollo; Raphaël Millière

The Vector Grounding Problem

Dimitri Coelho Mollo, Raphaël Millière

TL;DR

The paper reframes grounding from symbolic to vector-based representations in LLMs, introducing the Vector Grounding Problem as a challenge to intrinsic meaning independent of external interpretation. It argues that two ingredients—causal-informational relations to the world and a history of selection that endows internal states with the function to carry world information—are sufficient for referential grounding in LLMs. Grounding can be instantiated via three routes: post-training preference tuning, pre-training under certain conditions, and transient in-context learning (mesa-optimisation). The authors discuss implications for identity, multimodality, and embodiment, arguing that intrinsic meaning in outputs is possible in principle even if grounding does not entail full cognition or consciousness.

Abstract

Large language models (LLMs) produce seemingly meaningful outputs, yet they are trained on text alone without direct interaction with the world. This leads to a modern variant of the classical symbol grounding problem in AI: can LLMs' internal states and outputs be about extra-linguistic reality, independently of the meaning human interpreters project onto them? We argue that they can. We first distinguish referential grounding -- the connection between a representation and its worldly referent -- from other forms of grounding and argue it is the only kind essential to solving the problem. We contend that referential grounding is achieved when a system's internal states satisfy two conditions derived from teleosemantic theories of representation: (1) they stand in appropriate causal-informational relations to the world, and (2) they have a history of selection that has endowed them with the function of carrying this information. We argue that LLMs can meet both conditions, even without multimodality or embodiment.

The Vector Grounding Problem

TL;DR

Abstract

Paper Structure (23 sections, 1 figure)

This paper contains 23 sections, 1 figure.

Introduction
The Classical Symbol Grounding Problem
The Vector Grounding Problem
Five notions of Grounding
Referential Grounding
Sensorimotor Grounding
Relational Grounding
Communicative Grounding
Epistemic Grounding
What Grounding for LLMs?
Theories of Representational Content: Causes and History
Grounding Generative Pre-trained Models
Causal-informational relations
Selection history
The Argument from Post-Training
...and 8 more sections

Figures (1)

Figure 1: Five notions of grounding. A. Referential grounding: a lexical representation (<DOG>) is connected to its worldly referent (a dog). B. Sensorimotor grounding: a lexical representation (<DOG>) is connected to a sensory representation (image of a dog). C. Relational grounding: a lexical representation (<DOG>) is connected to other lexical representations (<PET>, <FURRY>). D. Communicative grounding: Two speakers calibrate their interpretation of an exchange to make use of the same lexical concept (<DOG>). E. Epistemic grounding: A lexical representation (<DOG>) is connected to information stored in a knowledge base (about dog facts).

The Vector Grounding Problem

TL;DR

Abstract

The Vector Grounding Problem

Authors

TL;DR

Abstract

Table of Contents

Figures (1)