Counterfactual Editing for Search Result Explanation

Zhichao Xu; Hemank Lamba; Qingyao Ai; Joel Tetreault; Alex Jaimes

Counterfactual Editing for Search Result Explanation

Zhichao Xu, Hemank Lamba, Qingyao Ai, Joel Tetreault, Alex Jaimes

TL;DR

This work formalizes counterfactual explanations for web search by defining pairwise explanations where a lower-ranked document could outrank a higher-ranked one under a counterfactual query $q'$. It introduces CFE2, a model-agnostic editing framework comprising a Search Model $ extbf{S}$, a Masker Model $ extbf{M}$, and an Editor Model $ extbf{E}$, and uses iterative masking with beam search to generate fluent, close counterfactual queries that flip the pairwise relevance $rel(q,d)$ to favor $d'$. The authors propose a concrete evaluation suite with desiderata (Max #Flips, Closeness, Fluency, Low Latency) and automatic metrics (FlipRate, CosSim, BERTScore-F1, RelFluency, Runtime), complemented by human evaluation. Empirical results on MS MARCO and BEIR-domain datasets show that CFE2 consistently improves counterfactual flip rates and semantic quality, with CFE2+ offering additional fluency gains at the cost of some closeness in certain settings. The findings suggest counterfactual explanations can provide actionable, low-latency, and user-friendly insights for search sessions, enabling query reformulation as a practical explanation modality.

Abstract

Search Result Explanation (SeRE) aims to improve search sessions' effectiveness and efficiency by helping users interpret documents' relevance. Existing works mostly focus on factual explanation, i.e. to find/generate supporting evidence about documents' relevance to search queries. However, research in cognitive sciences has shown that human explanations are contrastive i.e. people explain an observed event using some counterfactual events; such explanations reduce cognitive load and provide actionable insights. Though already proven effective in machine learning and NLP communities, there lacks a strict formulation on how counterfactual explanations should be defined and structured, in the context of web search. In this paper, we first discuss the possible formulation of counterfactual explanations in the IR context. Next, we formulate a suite of desiderata for counterfactual explanation in SeRE task and corresponding automatic metrics. With this desiderata, we propose a method named \textbf{C}ounter\textbf{F}actual \textbf{E}diting for Search Research \textbf{E}xplanation (\textbf{CFE2}). CFE2 provides pairwise counterfactual explanations for document pairs within a search engine result page. Our experiments on five public search datasets demonstrate that CFE2 can significantly outperform baselines in both automatic metrics and human evaluations.

Counterfactual Editing for Search Result Explanation

TL;DR

This work formalizes counterfactual explanations for web search by defining pairwise explanations where a lower-ranked document could outrank a higher-ranked one under a counterfactual query

. It introduces CFE2, a model-agnostic editing framework comprising a Search Model

, a Masker Model

, and an Editor Model

, and uses iterative masking with beam search to generate fluent, close counterfactual queries that flip the pairwise relevance

to favor

. The authors propose a concrete evaluation suite with desiderata (Max #Flips, Closeness, Fluency, Low Latency) and automatic metrics (FlipRate, CosSim, BERTScore-F1, RelFluency, Runtime), complemented by human evaluation. Empirical results on MS MARCO and BEIR-domain datasets show that CFE2 consistently improves counterfactual flip rates and semantic quality, with CFE2+ offering additional fluency gains at the cost of some closeness in certain settings. The findings suggest counterfactual explanations can provide actionable, low-latency, and user-friendly insights for search sessions, enabling query reformulation as a practical explanation modality.

Abstract

Paper Structure (16 sections, 1 equation, 3 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 1 equation, 3 figures, 7 tables, 1 algorithm.

Introduction
Related Work
Motivation and Problem Statement
Evaluation Principles and Metrics
Desiderata for Counterfactual Explanation for Web Search
Automatic Evaluation Metrics
The Proposed CFE2
Search Model $\mathbf{S}$
Masker Model $\mathbf{M}$
Editing Algorithm
Experiments Setup
Results and Analysis
Main Results
Human Evaluaiton
Ablation Studies
...and 1 more sections

Figures (3)

Figure 1: An overview of CFE2. Fig \ref{['fig:overview']} shows a sample workflow of one edit. CFE2 takes a query as input, then [1.A] search model $\mathbf{S}$ generates a SERP; for a document pair $(d,d')$, [1.B] masker $\mathbf{M}$ generates importance score, then [1.C] editor $\mathbf{E}$ performs editing. Fig \ref{['fig:editing']} shows a more detailed editing loop. [2.A] the top-1 important token from query is masked then the masked query is prepended to counterfactual document and input to editor $\mathbf{E}$; [2.B] the editor predicts word and stores the candidates to beam $\mathcal{B}$; then checks flip. If not flip, [2.C] it will mask one more token and run word prediction/decoding again; if flip, then [2.D] editing is complete and editor will output the counterfactual query with lowest perplexity, serving as a counterfactual explanation to initial $(q,d,d')$ triplet.
Figure 2: Screenshot of Annotation Task. The last question functions as an attention check.
Figure 3: Effect of beam size on FiQA dataset where x-axis denotes beam size. Runtime is measured by #seconds/edit. The line of CosSim overlaps the line of BERTScore-F1.

Counterfactual Editing for Search Result Explanation

TL;DR

Abstract

Counterfactual Editing for Search Result Explanation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)