Reverse Training to Nurse the Reversal Curse

Olga Golovneva; Zeyuan Allen-Zhu; Jason Weston; Sainbayar Sukhbaatar

Reverse Training to Nurse the Reversal Curse

Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, Sainbayar Sukhbaatar

TL;DR

The paper tackles the reversal curse in LLMs, where facts learned in one direction fail to generalize to the reverse. It introduces reverse training, a data-augmentation scheme that doubles training data by including reversed strings and treats the reverse direction as a second language, with four reversal types: token, word, entity-preserving, and random segment reversal. Through symbolic tasks, biographies, real-world knowledge pre-training, and fictitious facts finetuning, the authors show that reverse training—especially entity-preserving and random segment reversal—mitigates the reversal curse and can even improve standard forward performance in data-bound setups. The approach yields strong reversal-task performance (e.g., near 100% on NameToDescription in larger models) without compromising forward capabilities, suggesting a practical path to more robust knowledge generalization in LLMs. Overall, reverse training provides a general, low-overhead strategy to reduce directionality biases in language models across pre-training and fine-tuning contexts.

Abstract

Large language models (LLMs) have a surprising failure: when trained on "A has a feature B", they do not generalize to "B is a feature of A", which is termed the Reversal Curse. Even when training with trillions of tokens this issue still appears due to Zipf's law - hence even if we train on the entire internet. This work proposes an alternative training scheme, called reverse training, whereby all words are used twice, doubling the amount of available tokens. The LLM is trained in both forward and reverse directions by reversing the training strings while preserving (i.e., not reversing) chosen substrings, such as entities. We show that data-matched reverse-trained models provide superior performance to standard models on standard tasks, and compute-matched reverse-trained models provide far superior performance on reversal tasks, helping resolve the reversal curse issue.

Reverse Training to Nurse the Reversal Curse

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 2 figures, 9 tables)

This paper contains 19 sections, 3 equations, 2 figures, 9 tables.

Introduction
Reverse Training
Experiments
Symbolic reverse task
Reversing biography task
Reversing real-world knowledge via pre-training
Reversing fictitious facts via finetuning
Analysis & ablation experiments
Does reversal training hurt performance on standard tasks?
Does the unit of reversal matter?
Related Work
Reversal Curse & Mitgiations
Right-to-left, masked & other training variants
Conclusion
Symbolic reverse task details
...and 4 more sections

Figures (2)

Figure 1: Training loss for 1.4B models in the pre-training stage. On the $x$-axis we display the total number of tokens model has been trained on, including both in standard and reverse direction.
Figure 2: Evaluation results during training on the real-world celebrity task when using different pre-training methods for LLMs.

Reverse Training to Nurse the Reversal Curse

TL;DR

Abstract

Reverse Training to Nurse the Reversal Curse

Authors

TL;DR

Abstract

Table of Contents

Figures (2)