Table of Contents
Fetching ...

Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions

Yoonah Park, Haesung Pyun, Yohan Jo

TL;DR

The paper tackles the known discrepancy where LLMs internalize correct knowledge but falter on MCQs. It uncovers a low-dimensional, geometry-based knowledge–prediction subspace in residual streams, spanned by a knowledge basis and a prediction basis. Through KAPPA, a parameter-free, inference-time affine transformation aligns predictions with latent knowledge, yielding substantial gains on binary and multi-choice MCQs and extending to free-form generation. The approach demonstrates cross-dataset generalization and maintains or improves general capabilities, offering a practical method to elicit more faithful knowledge usage from LLMs. This work provides both a geometric understanding of the gap and a lightweight technique for more accurate model behavior in knowledge-intensive tasks.

Abstract

Large Language Models (LLMs) often fail on multiple-choice questions (MCQs) despite demonstrating correct knowledge in other contexts, such as free-form generation. To investigate the mechanism underlying this knowledge-prediction gap on MCQs and alleviate it, we conduct a probing analysis and find that residual streams in certain layers contain a subspace spanned by two important bases: a \emph{knowledge basis} that encodes the probability of the ground-truth answer for a given MCQ and a \emph{prediction basis} that encodes the probability of the answer choice predicted by the model. We observe that incorrect predictions arise from a misalignment of the model's hidden states along these two bases. Hence, we introduce \textbf{KAPPA} (Knowledge-Aligned Prediction through Projection-based Adjustment), a parameter-free intervention that transforms the hidden states to align the prediction coordinate with the knowledge coordinate within this subspace. Experiments on binary-choice reformulations of Big-Bench-Hard and ARC-Challenge show that KAPPA substantially improves accuracy and consistently outperforms baselines. While optimal subspaces differ across tasks, subspaces generalize to some extent, as supported by cross-dataset experiments. Moreover, KAPPA extends its effectiveness to free-form questions beyond MCQs. Our work provides a new geometric understanding of the knowledge-prediction gap and offers a practical method for better aligning model behavior with its latent knowledge.

Bridging the Knowledge-Prediction Gap in LLMs on Multiple-Choice Questions

TL;DR

The paper tackles the known discrepancy where LLMs internalize correct knowledge but falter on MCQs. It uncovers a low-dimensional, geometry-based knowledge–prediction subspace in residual streams, spanned by a knowledge basis and a prediction basis. Through KAPPA, a parameter-free, inference-time affine transformation aligns predictions with latent knowledge, yielding substantial gains on binary and multi-choice MCQs and extending to free-form generation. The approach demonstrates cross-dataset generalization and maintains or improves general capabilities, offering a practical method to elicit more faithful knowledge usage from LLMs. This work provides both a geometric understanding of the gap and a lightweight technique for more accurate model behavior in knowledge-intensive tasks.

Abstract

Large Language Models (LLMs) often fail on multiple-choice questions (MCQs) despite demonstrating correct knowledge in other contexts, such as free-form generation. To investigate the mechanism underlying this knowledge-prediction gap on MCQs and alleviate it, we conduct a probing analysis and find that residual streams in certain layers contain a subspace spanned by two important bases: a \emph{knowledge basis} that encodes the probability of the ground-truth answer for a given MCQ and a \emph{prediction basis} that encodes the probability of the answer choice predicted by the model. We observe that incorrect predictions arise from a misalignment of the model's hidden states along these two bases. Hence, we introduce \textbf{KAPPA} (Knowledge-Aligned Prediction through Projection-based Adjustment), a parameter-free intervention that transforms the hidden states to align the prediction coordinate with the knowledge coordinate within this subspace. Experiments on binary-choice reformulations of Big-Bench-Hard and ARC-Challenge show that KAPPA substantially improves accuracy and consistently outperforms baselines. While optimal subspaces differ across tasks, subspaces generalize to some extent, as supported by cross-dataset experiments. Moreover, KAPPA extends its effectiveness to free-form questions beyond MCQs. Our work provides a new geometric understanding of the knowledge-prediction gap and offers a practical method for better aligning model behavior with its latent knowledge.

Paper Structure

This paper contains 51 sections, 30 equations, 11 figures, 14 tables.

Figures (11)

  • Figure 1: KAPPA resolves the knowledge-prediction gap through geometric realignment. The examples provided here use the binary-choice setting to facilitate intuitive understanding. (a) Motivating example: an LLM answers correctly in free-form but fails in MCQ format. 3D visualization shows hidden representations. (b) In the 2D knowledge-prediction subspace, the circled point shows misalignment between knowledge (x-axis) and prediction (y-axis) coordinates. (c) KAPPA geometrically transforms representations to align coordinates. (d) This correction enables faithful expression of internal knowledge, producing the correct answer.
  • Figure 2: KAPPA pipeline overview. (1) Collect residual stream activations from transformer layers during inference on binary-choice questions. (2) Train knowledge and prediction probes: logistic regression classifiers that predict ground-truth labels and model outputs, respectively, from the same hidden states. (3) At inference, KAPPA applies a geometric transformation to the hidden representation. (4) 2D visualization showing the distribution of representations before (left) and after (right) the intervention in the knowledge-prediction subspace.
  • Figure 3: Accuracy of knowledge probes (blue) and prediction probes (orange) across transformer layers for LLaMA-2 and Qwen-2.5 on BBH-Binary, MMLU-Binary, and ARC-Challenge-Binary datasets. Dashed lines indicate base model accuracy for each dataset.
  • Figure 4: (a) Ratio of "MCQ wrong, free-form-correct" cases for each method. (b) Performance of each method on free-form version of BBH dataset.
  • Figure 5: Effect of alignment parameters $w$ and $\beta$ on model accuracy. (Left, Middle) Qwen-2.5 7B Instruct results across five $w$ values (0, 2, 4, 6, 8) and four $\beta$ values (0, 2, 4, 6). (Right) Llama-2 7B Chat results when varying only $w$ ($\beta=0$) versus only $\beta$ ($w=0$). Across both models, increasing either parameter consistently boosts accuracy.
  • ...and 6 more figures