Table of Contents
Fetching ...

Learning to Reason and Memorize with Self-Notes

Jack Lanchantin, Shubham Toshniwal, Jason Weston, Arthur Szlam, Sainbayar Sukhbaatar

TL;DR

The paper tackles the challenge of multi-step reasoning and memory in large language models by introducing Self-Notes, a mechanism that interleaves explicit reasoning tokens with the input context to create working memory while reading. The authors formalize Self-Notes within autoregressive transformers and explore four learning paradigms (Supervised, Semi-supervised, Unsupervised, Few-shot Prompted) across diverse tasks spanning synthetic and real-world domains. Empirically, Self-Notes outperform vanilla baselines and often surpass Chain-of-Thought and Scratchpad methods, demonstrating robust gains in both in-context and length-extended settings, as well as data-efficient improvements in semi- and unsupervised regimes. The results suggest Self-Notes enable online reasoning and state-tracking with better generalization, and point to future work in discovery of optimal notes via reinforcement learning and zero-shot generation in pretraining.

Abstract

Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, the model can deviate from the input context at any time to explicitly think and write down its thoughts. This allows the model to perform reasoning on the fly as it reads the context and even integrate previous reasoning steps, thus enhancing its memory with useful information and enabling multi-step reasoning. Experiments across a wide variety of tasks demonstrate that our method can outperform chain-of-thought and scratchpad methods by taking Self-Notes that interleave the input text.

Learning to Reason and Memorize with Self-Notes

TL;DR

The paper tackles the challenge of multi-step reasoning and memory in large language models by introducing Self-Notes, a mechanism that interleaves explicit reasoning tokens with the input context to create working memory while reading. The authors formalize Self-Notes within autoregressive transformers and explore four learning paradigms (Supervised, Semi-supervised, Unsupervised, Few-shot Prompted) across diverse tasks spanning synthetic and real-world domains. Empirically, Self-Notes outperform vanilla baselines and often surpass Chain-of-Thought and Scratchpad methods, demonstrating robust gains in both in-context and length-extended settings, as well as data-efficient improvements in semi- and unsupervised regimes. The results suggest Self-Notes enable online reasoning and state-tracking with better generalization, and point to future work in discovery of optimal notes via reinforcement learning and zero-shot generation in pretraining.

Abstract

Large language models have been shown to struggle with multi-step reasoning, and do not retain previous reasoning steps for future use. We propose a simple method for solving both of these problems by allowing the model to take Self-Notes. Unlike recent chain-of-thought or scratchpad approaches, the model can deviate from the input context at any time to explicitly think and write down its thoughts. This allows the model to perform reasoning on the fly as it reads the context and even integrate previous reasoning steps, thus enhancing its memory with useful information and enabling multi-step reasoning. Experiments across a wide variety of tasks demonstrate that our method can outperform chain-of-thought and scratchpad methods by taking Self-Notes that interleave the input text.
Paper Structure (18 sections, 2 equations, 6 figures, 13 tables)

This paper contains 18 sections, 2 equations, 6 figures, 13 tables.

Figures (6)

  • Figure 1: [top] Vanilla language models directly generate the answer (A) given the context and the question (Q). [middle] Scratchpad and Chain-of-Thought methods encourage the model to generate reasoning tokens before answering the question, but only after it has read the entire context. [bottom] Self-Notes (ours) allows the model to generate multiple internal reasoning notes that interleave the input context and question.
  • Figure 2: Performance of the Semi-supervised Self-Notes method with varying amounts of Self-Notes supervision on the Toy-Story and Algorithmic tasks.
  • Figure 3: Unsupervised Self-Notes vs Vanilla sample comparison on 1-variable Algorithmic.
  • Figure 4: 10k Supervised Self-Notes samples vs Vanilla sample comparison on Toy-story.
  • Figure 5: Comparison of Vanilla vs Self-Notes inference. Non-highlighted tokens are given to the model. Highlighted tokens are generated by the model. In Vanilla inference (top), the model generates only after the full context and question are provided. In Self-Notes inference (bottom), the model is able to generate after every word or sentence in the context. If the next most likely token is the Self-Notes start token "<SN>", then the model can autoregressively generate itself a note until the end token "</SN>" is generated, at which point it returns to reading the original context.
  • ...and 1 more figures