How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective

Qi Liu; Jiaxin Mao; Ji-Rong Wen

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective

Qi Liu, Jiaxin Mao, Ji-Rong Wen

TL;DR

This work probes how large language models internalize relevance signals for information retrieval using activation patching within a mechanistic interpretability framework. It uncovers a progressive information flow where early layers extract document and query content, middle layers integrate task instructions, and late layers deploy specific attention heads to format the final relevance judgments. The study identifies a sparse set of attention heads that mediate query–document interactions and demonstrates that the same mechanism operates across pointwise and pairwise prompts and across multiple models and datasets. These findings provide a principled view of LLM-based IR and offer guidance for designing more interpretable and reliable relevance assessment systems.

Abstract

Recent studies have shown that large language models (LLMs) can assess relevance and support information retrieval (IR) tasks such as document ranking and relevance judgment generation. However, the internal mechanisms by which off-the-shelf LLMs understand and operationalize relevance remain largely unexplored. In this paper, we systematically investigate how different LLM modules contribute to relevance judgment through the lens of mechanistic interpretability. Using activation patching techniques, we analyze the roles of various model components and identify a multi-stage, progressive process in generating either pointwise or pairwise relevance judgment. Specifically, LLMs first extract query and document information in the early layers, then process relevance information according to instructions in the middle layers, and finally utilize specific attention heads in the later layers to generate relevance judgments in the required format. Our findings provide insights into the mechanisms underlying relevance assessment in LLMs, offering valuable implications for future research on leveraging LLMs for IR tasks.

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective

TL;DR

Abstract

How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)