Table of Contents
Fetching ...

DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability

Yunzhen He, Yusuke Takase, Yoichi Ishibashi, Hidetoshi Shimodaira

TL;DR

DeLTa addresses the persistent problems of factual inaccuracies and flawed reasoning in large language models by introducing a decoding-only strategy that uses the trajectory of logits across Transformer layers. It treats layer logits as a time series and applies simple linear regression to extrapolate logits to a virtual upper layer, reshaping next-token probabilities without training or data augmentation. Empirical results across multiple models and benchmarks show consistent improvements in factuality (up to 4.9 percentage points on TruthfulQA and related tasks) and CoT-based reasoning (up to 8.1 percentage points on StrategyQA and 7.3 on GSM8K). The method is lightweight, architecture-agnostic, and highlights that higher-layer logits tend to follow linear trends, enabling reliable logits estimation with modest latency costs, though generalization to non-English data and larger models remains an open area for future work.

Abstract

Large Language Models (LLMs) are increasingly being used in real-world applications. However, concerns about the reliability of the content they generate persist, as it frequently deviates from factual correctness or exhibits deficiencies in logical reasoning. This paper proposes a novel decoding strategy aimed at enhancing both factual accuracy and inferential reasoning without requiring any modifications to the architecture or pre-trained parameters of LLMs. Our approach adjusts next-token probabilities by analyzing the trajectory of logits from lower to higher layers in Transformers and applying linear regression. We find that this Decoding by Logit Trajectory-based approach (DeLTa) effectively reinforces factuality and reasoning while mitigating incorrect generation. Experiments on TruthfulQA demonstrate that DeLTa attains up to a 4.9% improvement over the baseline. Furthermore, it enhances performance by up to 8.1% on StrategyQA and 7.3% on GSM8K, both of which demand strong reasoning capabilities.

DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability

TL;DR

DeLTa addresses the persistent problems of factual inaccuracies and flawed reasoning in large language models by introducing a decoding-only strategy that uses the trajectory of logits across Transformer layers. It treats layer logits as a time series and applies simple linear regression to extrapolate logits to a virtual upper layer, reshaping next-token probabilities without training or data augmentation. Empirical results across multiple models and benchmarks show consistent improvements in factuality (up to 4.9 percentage points on TruthfulQA and related tasks) and CoT-based reasoning (up to 8.1 percentage points on StrategyQA and 7.3 on GSM8K). The method is lightweight, architecture-agnostic, and highlights that higher-layer logits tend to follow linear trends, enabling reliable logits estimation with modest latency costs, though generalization to non-English data and larger models remains an open area for future work.

Abstract

Large Language Models (LLMs) are increasingly being used in real-world applications. However, concerns about the reliability of the content they generate persist, as it frequently deviates from factual correctness or exhibits deficiencies in logical reasoning. This paper proposes a novel decoding strategy aimed at enhancing both factual accuracy and inferential reasoning without requiring any modifications to the architecture or pre-trained parameters of LLMs. Our approach adjusts next-token probabilities by analyzing the trajectory of logits from lower to higher layers in Transformers and applying linear regression. We find that this Decoding by Logit Trajectory-based approach (DeLTa) effectively reinforces factuality and reasoning while mitigating incorrect generation. Experiments on TruthfulQA demonstrate that DeLTa attains up to a 4.9% improvement over the baseline. Furthermore, it enhances performance by up to 8.1% on StrategyQA and 7.3% on GSM8K, both of which demand strong reasoning capabilities.

Paper Structure

This paper contains 27 sections, 7 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Overview of DeLTa. When input tokens are fed into the LLM, the logits from each layer (e.g., layers 30, 31, and 32) are computed and shown as bar graphs to illustrate changes between tokens (e.g., "Seattle" vs. "Olympia"). A linear regression (red line) approximates the logit trajectory (blue dots). Using this regression, we extrapolate the logits for a virtual 33rd layer (red dot) and improve prediction beyond the original outputs.
  • Figure 2: Mean coefficient of determination (mean $R^2$) obtained from the experimental procedure. Target LLMs are Qwen2.5-7B, Mistral-v0.1-7B, and Llama-3.1-8B. The vertical axis represents the mean $R^2$, and the horizontal axis represents the ratio $(N_{mid}/N)$ of layer indices.
  • Figure 3: TruthfulQA
  • Figure 4: TriviaQA
  • Figure 5: Natural Questions
  • ...and 2 more figures