Table of Contents
Fetching ...

A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem

Luciano Floridi, Yiyang Jia, Fernando Tohmé

TL;DR

The paper introduces a categorical Rel-based framework to formalize how humans and LLMs transform content into truth-conditional propositions about possible worlds. By modeling two epistemic paths and using entailment-commutativity, it shows that LLM outputs can be sound within the human-grounded content but do not achieve genuine symbol grounding, instead circumvening it by parasitically leveraging pre-grounded human content. The work unifies diverse failure modes under a single entailment criterion and discusses extensions to probabilistic settings, while highlighting the inevitable role of hallucinations outside the sound domain. It also argues that true grounding requires perceptual, causal, and normative engagement with the world—capabilities LLMs currently lack, even with multimodal enhancements—thus positioning them as advanced interfaces rather than grounded knowers. The framework aims to improve clarity in evaluating and deploying LLMs, emphasizing responsible use within domains where entailment holds and careful verification elsewhere.

Abstract

This paper presents a formal, categorical framework for analysing how humans and large language models (LLMs) transform content into truth-evaluated propositions about a state space of possible worlds W , in order to argue that LLMs do not solve but circumvent the symbol grounding problem.

A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem

TL;DR

The paper introduces a categorical Rel-based framework to formalize how humans and LLMs transform content into truth-conditional propositions about possible worlds. By modeling two epistemic paths and using entailment-commutativity, it shows that LLM outputs can be sound within the human-grounded content but do not achieve genuine symbol grounding, instead circumvening it by parasitically leveraging pre-grounded human content. The work unifies diverse failure modes under a single entailment criterion and discusses extensions to probabilistic settings, while highlighting the inevitable role of hallucinations outside the sound domain. It also argues that true grounding requires perceptual, causal, and normative engagement with the world—capabilities LLMs currently lack, even with multimodal enhancements—thus positioning them as advanced interfaces rather than grounded knowers. The framework aims to improve clarity in evaluating and deploying LLMs, emphasizing responsible use within domains where entailment holds and careful verification elsewhere.

Abstract

This paper presents a formal, categorical framework for analysing how humans and large language models (LLMs) transform content into truth-evaluated propositions about a state space of possible worlds W , in order to argue that LLMs do not solve but circumvent the symbol grounding problem.

Paper Structure

This paper contains 19 sections, 2 theorems, 13 equations, 2 figures.

Key Result

Proposition 2.1

For any $h \in H$:

Figures (2)

  • Figure 1: Schematic representation of the human and LLM routes in the category $\mathcal{C}$, a subcategory of $\mathbf{Rel}$. The diagram illustrates three possible mappings from the space of human epistemic states $H$ to the state space of possible worlds $W$.
  • Figure 2: The training pipeline: from content to the space of trained models.

Theorems & Definitions (4)

  • Proposition 2.1
  • proof
  • Proposition 2.2
  • proof