Table of Contents
Fetching ...

NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

Zheng Yi Ho, Siyuan Liang, Sen Zhang, Yibing Zhan, Dacheng Tao

TL;DR

NoVo is presented, which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs) and opens new frontiers in LLM interpretability, robustness and reliability.

Abstract

Hallucinations in Large Language Models (LLMs) remain a major obstacle, particularly in high-stakes applications where factual accuracy is critical. While representation editing and reading methods have made strides in reducing hallucinations, their heavy reliance on specialised tools and training on in-domain samples, makes them difficult to scale and prone to overfitting. This limits their accuracy gains and generalizability to diverse datasets. This paper presents a lightweight method, Norm Voting (NoVo), which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs). NoVo begins by automatically selecting truth-correlated head norms with an efficient, inference-only algorithm using only 30 random samples, allowing NoVo to effortlessly scale to diverse datasets. Afterwards, selected head norms are employed in a simple voting algorithm, which yields significant gains in prediction accuracy. On TruthfulQA MC1, NoVo surpasses the current state-of-the-art and all previous methods by an astounding margin -- at least 19 accuracy points. NoVo demonstrates exceptional generalization to 20 diverse datasets, with significant gains in over 90\% of them, far exceeding all current representation editing and reading methods. NoVo also reveals promising gains to finetuning strategies and building textual adversarial defence. NoVo's effectiveness with head norms opens new frontiers in LLM interpretability, robustness and reliability.

NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models

TL;DR

NoVo is presented, which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs) and opens new frontiers in LLM interpretability, robustness and reliability.

Abstract

Hallucinations in Large Language Models (LLMs) remain a major obstacle, particularly in high-stakes applications where factual accuracy is critical. While representation editing and reading methods have made strides in reducing hallucinations, their heavy reliance on specialised tools and training on in-domain samples, makes them difficult to scale and prone to overfitting. This limits their accuracy gains and generalizability to diverse datasets. This paper presents a lightweight method, Norm Voting (NoVo), which harnesses the untapped potential of attention head norms to dramatically enhance factual accuracy in zero-shot multiple-choice questions (MCQs). NoVo begins by automatically selecting truth-correlated head norms with an efficient, inference-only algorithm using only 30 random samples, allowing NoVo to effortlessly scale to diverse datasets. Afterwards, selected head norms are employed in a simple voting algorithm, which yields significant gains in prediction accuracy. On TruthfulQA MC1, NoVo surpasses the current state-of-the-art and all previous methods by an astounding margin -- at least 19 accuracy points. NoVo demonstrates exceptional generalization to 20 diverse datasets, with significant gains in over 90\% of them, far exceeding all current representation editing and reading methods. NoVo also reveals promising gains to finetuning strategies and building textual adversarial defence. NoVo's effectiveness with head norms opens new frontiers in LLM interpretability, robustness and reliability.

Paper Structure

This paper contains 22 sections, 3 equations, 13 figures, 12 tables.

Figures (13)

  • Figure 1: Overview of our method. NoVo improves factuality in diverse MCQs.
  • Figure 2: The Norm Matrix at the right contains all $T^{l,h}$ values taken throughout the LLM, but cannot be used to answer MCQs. Instead, this operation forms the basic building block of NoVo.
  • Figure 3: The selection stage uses the Norm Matrix from Figure \ref{['fig:method_motivation_t']} to determine the correlation direction of each $T^{l,h}$, serialised as Indicators. All $(l,h)$ indices that vary with truth are also specified in the Index Vector, expressed as enumerated integers for clarity.
  • Figure 4: The voting stage uses the Norm Matrix from Figure \ref{['fig:method_motivation_t']}, and the Indicators and Index Vector from Figure \ref{['fig:voter_selection']}, to accurately answer MCQ questions during LLM inference.
  • Figure 5: Attention-weighted value state components at various sequence positions.
  • ...and 8 more figures