Table of Contents
Fetching ...

Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA

Eduard Tulchinskii, Laida Kushnareva, Kristian Kuznetsov, Anastasia Voznyuk, Andrei Andriiainen, Irina Piontkovskaya, Evgeny Burnaev, Serguei Barannikov

TL;DR

The paper tackles the mismatch between model knowledge and MCQA format by exploiting select-and-copy attention heads. It introduces the QK-score $S^{(l,h)}_{QK}$ and Attention-score $S^{(l,h)}_{Att}$ to quantify head-level option selection, focusing on end-of-line option tokens. The authors identify robust select-and-copy heads across models from $7\times 10^9$ to $70\times 10^9$ parameters and show significant gains on real MCQA benchmarks (up to $16\%$ accuracy) and dramatic gains on a synthetic task (up to $60\%$), while proving stability to option order. This work advances mechanistic interpretability by linking internal attention dynamics to task performance and suggests a pathway for extracting latent knowledge from LLMs beyond conventional logits.

Abstract

A standard way to evaluate the abilities of LLM involves presenting a multiple-choice question and selecting the option with the highest logit as the model's predicted answer. However, such a format for evaluating LLMs has limitations, since even if the model knows the correct answer, it may struggle to select the corresponding letter simply due to difficulties in following this rigid format. To address this, we introduce new scores that better capture and reveal model's underlying knowledge: the Query-Key Score (QK-score), derived from the interaction between query and key representations in attention heads, and the Attention Score, based on attention weights. These scores are extracted from specific \textit{select-and-copy} heads, which show consistent performance across popular Multi-Choice Question Answering (MCQA) datasets. Based on these scores, our method improves knowledge extraction, yielding up to 16\% gain for LLaMA2-7B and up to 10\% for larger models on popular MCQA benchmarks. At the same time, the accuracy on a simple synthetic dataset, where the model explicitly knows the right answer, increases by almost 60\%, achieving nearly perfect accuracy, therefore demonstrating the method's efficiency in mitigating MCQA format limitations. To support our claims, we conduct experiments on models ranging from 7 billion to 70 billion parameters in both zero- and few-shot setups.

Listening to the Wise Few: Select-and-Copy Attention Heads for Multiple-Choice QA

TL;DR

The paper tackles the mismatch between model knowledge and MCQA format by exploiting select-and-copy attention heads. It introduces the QK-score and Attention-score to quantify head-level option selection, focusing on end-of-line option tokens. The authors identify robust select-and-copy heads across models from to parameters and show significant gains on real MCQA benchmarks (up to accuracy) and dramatic gains on a synthetic task (up to ), while proving stability to option order. This work advances mechanistic interpretability by linking internal attention dynamics to task performance and suggests a pathway for extracting latent knowledge from LLMs beyond conventional logits.

Abstract

A standard way to evaluate the abilities of LLM involves presenting a multiple-choice question and selecting the option with the highest logit as the model's predicted answer. However, such a format for evaluating LLMs has limitations, since even if the model knows the correct answer, it may struggle to select the corresponding letter simply due to difficulties in following this rigid format. To address this, we introduce new scores that better capture and reveal model's underlying knowledge: the Query-Key Score (QK-score), derived from the interaction between query and key representations in attention heads, and the Attention Score, based on attention weights. These scores are extracted from specific \textit{select-and-copy} heads, which show consistent performance across popular Multi-Choice Question Answering (MCQA) datasets. Based on these scores, our method improves knowledge extraction, yielding up to 16\% gain for LLaMA2-7B and up to 10\% for larger models on popular MCQA benchmarks. At the same time, the accuracy on a simple synthetic dataset, where the model explicitly knows the right answer, increases by almost 60\%, achieving nearly perfect accuracy, therefore demonstrating the method's efficiency in mitigating MCQA format limitations. To support our claims, we conduct experiments on models ranging from 7 billion to 70 billion parameters in both zero- and few-shot setups.
Paper Structure (25 sections, 7 equations, 26 figures, 5 tables)

This paper contains 25 sections, 7 equations, 26 figures, 5 tables.

Figures (26)

  • Figure 1: Our method calculates the Query-Key score between the end-of-line token of an answer option and the last token of the prompt for the designated head, from which we derive the answer.
  • Figure 2: (a) Scheme for option-representative token types. (b) Performance of QK-score and Attention-score for different option-representative tokens on Llama2-7B base.
  • Figure 3: Comparison of different methods for LLaMA2-7B (base) on various Q&A datasets. Reported metrics are Accuracy (Acc) and Permutation Accuracy (PA).
  • Figure 4: Zero-ablation of heads for LLaMA2-7B (upper) and LLaMA3-8B (lower)
  • Figure 5: (a) Heatmap for (layer, head) indices for the best performing heads in LLaMA2-7B. The top 5% heads were selected for each N-shot setup, with all 4 datasets combined. The intensity of color indicates the maximal N where this pair appears. The framed cells indicate best-performing pairs that are uniform for 3 or 4 datasets. The first 8 layers are omitted because no interesting heads are found there. (b) Synthetic Dataset QK-score accuracy for various numbers of options (number of options is plotted on x axis, varies from 0 to 24) in zero-shot for LLaMA2-7B. Different colors of the lines correspond to different heads. "Square" markers correspond to the heads, performing well across real datasets (they are "framed" on Figure \ref{['fig:best_heads']}), and "round" markers correspond to the heads that work well on the synthetic dataset specifically. The "triangle"-marked dotted line reflects the baseline model's performance.
  • ...and 21 more figures