Table of Contents
Fetching ...

Deep Sequence Modeling with Quantum Dynamics: Language as a Wave Function

Ahmed Nebli, Hadi Saadatdoorabi, Kevin Yam

TL;DR

A sequence modeling framework in which the latent state is a complex-valued wave function evolving on a finite-dimensional Hilbert space under a learned, time-dependent Hamiltonian, using quantum interference to steer the phases of complex amplitudes.

Abstract

We introduce a sequence modeling framework in which the latent state is a complex-valued wave function evolving on a finite-dimensional Hilbert space under a learned, time-dependent Hamiltonian. Unlike standard recurrent architectures that rely on gating mechanisms to suppress competing hypotheses, our framework utilizes quantum interference: the Hamiltonian steers the phases of complex amplitudes so that conflicting interpretations cancel while compatible ones reinforce. The dynamics are strictly unitary, ensuring that the state norm is preserved exactly at every time step via a Cayley (Crank--Nicolson) discretization. Token probabilities are extracted using the Born rule, a quadratic measurement operator that couples magnitudes and relative phases. Our primary theoretical contribution is a separation theorem characterizing the representational advantage of this readout: we define a family of disambiguation tasks that a complex unitary model of dimension $N$ solves exactly, but which requires a state dimension of $Ω(N^2)$ for any real-valued orthogonal model equipped with a standard affine-softmax readout. This quadratic gap arises because the Born rule implicitly lifts the $N$-dimensional state into the space of rank-one Hermitian matrices, accessing pairwise phase correlations that are inaccessible to linear projections. Finally, we derive a continuity equation for the latent probability mass, yielding conserved pairwise currents that serve as a built-in diagnostic for tracing information flow between dimensions.

Deep Sequence Modeling with Quantum Dynamics: Language as a Wave Function

TL;DR

A sequence modeling framework in which the latent state is a complex-valued wave function evolving on a finite-dimensional Hilbert space under a learned, time-dependent Hamiltonian, using quantum interference to steer the phases of complex amplitudes.

Abstract

We introduce a sequence modeling framework in which the latent state is a complex-valued wave function evolving on a finite-dimensional Hilbert space under a learned, time-dependent Hamiltonian. Unlike standard recurrent architectures that rely on gating mechanisms to suppress competing hypotheses, our framework utilizes quantum interference: the Hamiltonian steers the phases of complex amplitudes so that conflicting interpretations cancel while compatible ones reinforce. The dynamics are strictly unitary, ensuring that the state norm is preserved exactly at every time step via a Cayley (Crank--Nicolson) discretization. Token probabilities are extracted using the Born rule, a quadratic measurement operator that couples magnitudes and relative phases. Our primary theoretical contribution is a separation theorem characterizing the representational advantage of this readout: we define a family of disambiguation tasks that a complex unitary model of dimension solves exactly, but which requires a state dimension of for any real-valued orthogonal model equipped with a standard affine-softmax readout. This quadratic gap arises because the Born rule implicitly lifts the -dimensional state into the space of rank-one Hermitian matrices, accessing pairwise phase correlations that are inaccessible to linear projections. Finally, we derive a continuity equation for the latent probability mass, yielding conserved pairwise currents that serve as a built-in diagnostic for tracing information flow between dimensions.
Paper Structure (70 sections, 8 theorems, 68 equations, 3 figures, 2 tables)

This paper contains 70 sections, 8 theorems, 68 equations, 3 figures, 2 tables.

Key Result

Proposition 3.1

If $H_{\mathrm{int},I}(t)$ is Hermitian, then $W(t)$ is unitary for every $\Delta t > 0$.

Figures (3)

  • Figure 1: Detailed architecture of a single time step in the quantum sequence model. The neural network $g_\theta$ (the sole unconstrained learned component, bold outline) receives the token embedding and the current interaction-picture state, and outputs the complex matrix $\Phi(t)$ and real vector $\delta(t)$. The outer product $\Phi\Phi^\dagger$ is Hermitian by construction; adding the learned diagonal $H_0$ yields the full Hamiltonian $H(t)$. The interaction picture removes the known free oscillations, and the Cayley transform discretizes the remaining evolution into an exactly unitary update $W(t)$. Unitarity preserves $\|\psi\|=1$ at every step, which ensures the Born rule produces a valid probability distribution over the vocabulary. Left-side annotations mark learned components; right-side annotations trace the algebraic guarantees. The dashed line indicates recurrent state feedback, which makes the overall dynamics nonlinear despite each individual step being a linear (unitary) map.
  • Figure 2: Multi-scale view of the quantum sequence model. (a) The model unrolled over $T$ time steps. At each step the same network $g_\theta$ generates a token- and state-dependent Hamiltonian; the Cayley update advances the state on the complex unit sphere; and the Born rule reads out token probabilities. The loss aggregates log-probabilities across all steps. (b) Illustration of the interference mechanism. Processing the disambiguating token "steep" after the prefix "The bank was" causes probability to flow from the financial interpretation to the river interpretation via conserved, antisymmetric probability currents $J_{j\leftarrow k}$. (c) Detail of the Born-rule readout. The quadratic map $\psi\mapsto\psi\psi^\dagger$ lifts the $N$-dimensional complex state into the $N^2$-dimensional space of Hermitian matrices, exposing both magnitude and phase cross-terms to the linear measurement $\operatorname{tr}(M_k\rho)$.
  • Figure 3: Source of the expressivity separation. Left: The Born-rule readout applies a quadratic (Veronese-type) lifting $\psi\mapsto\psi\psi^\dagger$, promoting the $N$-dimensional complex state to the $N^2$-dimensional space of rank-one Hermitian matrices. The measurement $\operatorname{tr}(M_k\rho)$ is then linear in the lifted space, accessing all $N^2$ features including the $\binom{N}{2}$ pairwise phase cross-terms that encode interference. Right: The affine-softmax readout computes $z_k = w_k^\top h + b_k$, which is linear in the $d$-dimensional real state. The rank constraint (Lemma \ref{['lem:softmax_rank']}) limits the log-probability matrix to rank $d{+}2$. Bottom: Matching the $N^2$ features required by the Born-rule target forces $d \geq N^2 - 2$, a quadratic gap over the complex model's dimension $N$, under the full-rank condition on $L^*$ stated in Theorem \ref{['thm:separation']}.

Theorems & Definitions (21)

  • Proposition 3.1: Unitarity of the Cayley update
  • proof
  • Proposition 4.1: Structural properties of the probability current
  • proof
  • Definition 5.1: Complex Unitary Sequence Model (CUSM)
  • Definition 5.2: Real Orthogonal Sequence Model (ROSM)
  • Definition 5.3: General-position unitaries
  • Definition 5.4: Informationally complete measurement
  • Proposition 5.5: CUSM upper bound
  • proof
  • ...and 11 more