Table of Contents
Fetching ...

Perspectives on Large Language Models: Polysemy, Stochasticity, Exponential Expressibility, and Unitary Attention

Karl Svozil

TL;DR

The paper investigates how large language models resolve polysemy, manage stochastic generation, and conceptually align with quantum-inspired formalisms. It argues that expressive capacity grows exponentially with embedding dimension via quasi-orthogonal feature directions and that dynamic self-attention is central to disambiguating meaning, while stochastic sampling fuels creative output. A quantum attention framework is introduced as a unitary extension of classical attention, reframing LLM computation as reversible dynamics in Hilbert space with a final measurement yielding tokens. The work offers a conceptual bridge between deep learning and quantum theory, highlighting implications for interpretability, robustness, and future computational paradigms in language technologies.

Abstract

This paper explores foundational aspects of Large Language Models (LLMs). We analyze how the expressibility of semantic features scales exponentially with embedding space dimensions using quasi-orthogonal vectors. We contrast the dynamic, context-dependent embeddings of Transformer architectures, which resolve polysemy, with a static vector approach based on quantum contextuality. Stochasticity is framed as an essential feature for enabling creative output through probabilistic sampling. Finally, we propose quantum attention as a unitary extension of classical mechanisms, reframing LLM processing as reversible, quantum-like evolutions in Hilbert space.

Perspectives on Large Language Models: Polysemy, Stochasticity, Exponential Expressibility, and Unitary Attention

TL;DR

The paper investigates how large language models resolve polysemy, manage stochastic generation, and conceptually align with quantum-inspired formalisms. It argues that expressive capacity grows exponentially with embedding dimension via quasi-orthogonal feature directions and that dynamic self-attention is central to disambiguating meaning, while stochastic sampling fuels creative output. A quantum attention framework is introduced as a unitary extension of classical attention, reframing LLM computation as reversible dynamics in Hilbert space with a final measurement yielding tokens. The work offers a conceptual bridge between deep learning and quantum theory, highlighting implications for interpretability, robustness, and future computational paradigms in language technologies.

Abstract

This paper explores foundational aspects of Large Language Models (LLMs). We analyze how the expressibility of semantic features scales exponentially with embedding space dimensions using quasi-orthogonal vectors. We contrast the dynamic, context-dependent embeddings of Transformer architectures, which resolve polysemy, with a static vector approach based on quantum contextuality. Stochasticity is framed as an essential feature for enabling creative output through probabilistic sampling. Finally, we propose quantum attention as a unitary extension of classical mechanisms, reframing LLM processing as reversible, quantum-like evolutions in Hilbert space.

Paper Structure

This paper contains 54 sections, 35 equations, 1 figure.

Figures (1)

  • Figure 1: Hypergraph visualization of three intertwining contexts for the word 'bank'. The central vector $\bm{v}(\text{bank})$ is a shared element common to three distinct contexts (orthonormal bases). Each context connects $\bm{v}(\text{run})$ to a different semantic association: physical activity $\bm{v}(\text{exercise})$, management $\bm{v}(\text{business})$, and flow $\bm{v}(\text{river})$.