Table of Contents
Fetching ...

On the Expressive Power of Contextual Relations in Transformers

Demián Fraiman

Abstract

Transformer architectures have achieved remarkable empirical success in modeling contextual relationships in natural language, yet a precise mathematical characterization of their expressive power remains incomplete. In this work, we introduce a measure-theoretic framework for contextual representations in which texts are modeled as probability measures over a semantic embedding space, and contextual relations between words, are represented as coupling measures between them. Within this setting, we introduce Sinkhorn Transformer, a transformer-like architecture. Our main result is a universal approximation theorem: any continuous coupling function between probability measures, that encodes the semantic relation coupling measure, can be uniformly approximated by a Sinkhorn Transformer with appropriate parameters.

On the Expressive Power of Contextual Relations in Transformers

Abstract

Transformer architectures have achieved remarkable empirical success in modeling contextual relationships in natural language, yet a precise mathematical characterization of their expressive power remains incomplete. In this work, we introduce a measure-theoretic framework for contextual representations in which texts are modeled as probability measures over a semantic embedding space, and contextual relations between words, are represented as coupling measures between them. Within this setting, we introduce Sinkhorn Transformer, a transformer-like architecture. Our main result is a universal approximation theorem: any continuous coupling function between probability measures, that encodes the semantic relation coupling measure, can be uniformly approximated by a Sinkhorn Transformer with appropriate parameters.

Paper Structure

This paper contains 19 sections, 20 theorems, 113 equations, 5 figures.

Key Result

Proposition 4.1

The affine space $\mathcal{A}$ is closed.

Figures (5)

  • Figure 1: One text contextual semantic graph.
  • Figure 2: Two texts contextual semantic graph.
  • Figure 3: Sinkhorn Transformer.
  • Figure 4: Sinkhorn Transformer architecture in the single-text setting.
  • Figure 5: Sinkhorn Transformer architecture in the two-text setting.

Theorems & Definitions (39)

  • Definition 3.1: Coupling System
  • Definition 3.2: Measure-valued multi-head attention
  • Definition 3.3: Context maps and context operators
  • Definition 3.4: Composition of context maps
  • Definition 3.5: Deep Transformer
  • Proposition 4.1
  • Proposition 4.2
  • Corollary 4.3
  • Proposition 5.1
  • Proposition 5.2
  • ...and 29 more