Reversal Invariance in Autoregressive Language Models

Mihir Sahasrabudhe

Reversal Invariance in Autoregressive Language Models

Mihir Sahasrabudhe

TL;DR

This paper analyzes reversal invariance in autoregressive language modeling, showing that the next-token loss $L_{NLL}$ is invariant to reversing sequences up to vocabularies and positional encodings, which explains why models trained on reversed text can match forward-text performance. It frames language as inherently directional and argues that this symmetry is a fundamental limitation of likelihood-based pretraining, since the entropy rate $h$ is reversal-invariant while natural language exhibits nonzero time-reversal divergence $ ext{A}$. Through a formal treatment of reversal on strings, tokenizations, and permutation equivariance, the authors demonstrate an exact (up to reparameterization) equivalence between the forward and reversed training problems. An information-theoretic perspective reinforces the claim that AR pretraining optimizes a bidirectional statistic, suggesting that directional understanding must be injected via objective design or asymmetry-aware signals during pretraining or post-training alignment. The work outlines a path toward asymmetry-aware understanding, inviting future empirical work to develop and evaluate objectives and architectures that explicitly encode the arrow of language while preserving or enhancing standard language modeling capabilities.

Abstract

We formalize a structural property of the causal (autoregressive) language modeling (CLM) objective: reversal invariance. Formally, the next-token prediction loss assigns identical likelihood to a corpus and its reversal, implying that standard CLM pretraining is direction-blind. This symmetry explains why models trained on reversed text can achieve comparable performance to those trained on forward text, despite the inherently time-asymmetric nature of human language and reasoning. We argue that this invariance represents a limitation of current pretraining objectives rather than a benign artifact. If natural language encodes directional dependencies - phonological, morphological, or causal - a symmetric objective may fail to capture them. We therefore propose viewing pretraining through the lens of temporal asymmetry, motivating future work on loss functions and architectures that explicitly model the arrow of language while retaining standard language modeling capacity.

Reversal Invariance in Autoregressive Language Models

TL;DR

This paper analyzes reversal invariance in autoregressive language modeling, showing that the next-token loss

is invariant to reversing sequences up to vocabularies and positional encodings, which explains why models trained on reversed text can match forward-text performance. It frames language as inherently directional and argues that this symmetry is a fundamental limitation of likelihood-based pretraining, since the entropy rate

is reversal-invariant while natural language exhibits nonzero time-reversal divergence

. Through a formal treatment of reversal on strings, tokenizations, and permutation equivariance, the authors demonstrate an exact (up to reparameterization) equivalence between the forward and reversed training problems. An information-theoretic perspective reinforces the claim that AR pretraining optimizes a bidirectional statistic, suggesting that directional understanding must be injected via objective design or asymmetry-aware signals during pretraining or post-training alignment. The work outlines a path toward asymmetry-aware understanding, inviting future empirical work to develop and evaluate objectives and architectures that explicitly encode the arrow of language while preserving or enhancing standard language modeling capabilities.

Reversal Invariance in Autoregressive Language Models

TL;DR

Abstract

Reversal Invariance in Autoregressive Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (10)