Table of Contents
Fetching ...

Counterfactual Editing for Search Result Explanation

Zhichao Xu, Hemank Lamba, Qingyao Ai, Joel Tetreault, Alex Jaimes

TL;DR

This work formalizes counterfactual explanations for web search by defining pairwise explanations where a lower-ranked document could outrank a higher-ranked one under a counterfactual query $q'$. It introduces CFE2, a model-agnostic editing framework comprising a Search Model $ extbf{S}$, a Masker Model $ extbf{M}$, and an Editor Model $ extbf{E}$, and uses iterative masking with beam search to generate fluent, close counterfactual queries that flip the pairwise relevance $rel(q,d)$ to favor $d'$. The authors propose a concrete evaluation suite with desiderata (Max #Flips, Closeness, Fluency, Low Latency) and automatic metrics (FlipRate, CosSim, BERTScore-F1, RelFluency, Runtime), complemented by human evaluation. Empirical results on MS MARCO and BEIR-domain datasets show that CFE2 consistently improves counterfactual flip rates and semantic quality, with CFE2+ offering additional fluency gains at the cost of some closeness in certain settings. The findings suggest counterfactual explanations can provide actionable, low-latency, and user-friendly insights for search sessions, enabling query reformulation as a practical explanation modality.

Abstract

Search Result Explanation (SeRE) aims to improve search sessions' effectiveness and efficiency by helping users interpret documents' relevance. Existing works mostly focus on factual explanation, i.e. to find/generate supporting evidence about documents' relevance to search queries. However, research in cognitive sciences has shown that human explanations are contrastive i.e. people explain an observed event using some counterfactual events; such explanations reduce cognitive load and provide actionable insights. Though already proven effective in machine learning and NLP communities, there lacks a strict formulation on how counterfactual explanations should be defined and structured, in the context of web search. In this paper, we first discuss the possible formulation of counterfactual explanations in the IR context. Next, we formulate a suite of desiderata for counterfactual explanation in SeRE task and corresponding automatic metrics. With this desiderata, we propose a method named \textbf{C}ounter\textbf{F}actual \textbf{E}diting for Search Research \textbf{E}xplanation (\textbf{CFE2}). CFE2 provides pairwise counterfactual explanations for document pairs within a search engine result page. Our experiments on five public search datasets demonstrate that CFE2 can significantly outperform baselines in both automatic metrics and human evaluations.

Counterfactual Editing for Search Result Explanation

TL;DR

This work formalizes counterfactual explanations for web search by defining pairwise explanations where a lower-ranked document could outrank a higher-ranked one under a counterfactual query . It introduces CFE2, a model-agnostic editing framework comprising a Search Model , a Masker Model , and an Editor Model , and uses iterative masking with beam search to generate fluent, close counterfactual queries that flip the pairwise relevance to favor . The authors propose a concrete evaluation suite with desiderata (Max #Flips, Closeness, Fluency, Low Latency) and automatic metrics (FlipRate, CosSim, BERTScore-F1, RelFluency, Runtime), complemented by human evaluation. Empirical results on MS MARCO and BEIR-domain datasets show that CFE2 consistently improves counterfactual flip rates and semantic quality, with CFE2+ offering additional fluency gains at the cost of some closeness in certain settings. The findings suggest counterfactual explanations can provide actionable, low-latency, and user-friendly insights for search sessions, enabling query reformulation as a practical explanation modality.

Abstract

Search Result Explanation (SeRE) aims to improve search sessions' effectiveness and efficiency by helping users interpret documents' relevance. Existing works mostly focus on factual explanation, i.e. to find/generate supporting evidence about documents' relevance to search queries. However, research in cognitive sciences has shown that human explanations are contrastive i.e. people explain an observed event using some counterfactual events; such explanations reduce cognitive load and provide actionable insights. Though already proven effective in machine learning and NLP communities, there lacks a strict formulation on how counterfactual explanations should be defined and structured, in the context of web search. In this paper, we first discuss the possible formulation of counterfactual explanations in the IR context. Next, we formulate a suite of desiderata for counterfactual explanation in SeRE task and corresponding automatic metrics. With this desiderata, we propose a method named \textbf{C}ounter\textbf{F}actual \textbf{E}diting for Search Research \textbf{E}xplanation (\textbf{CFE2}). CFE2 provides pairwise counterfactual explanations for document pairs within a search engine result page. Our experiments on five public search datasets demonstrate that CFE2 can significantly outperform baselines in both automatic metrics and human evaluations.
Paper Structure (16 sections, 1 equation, 3 figures, 7 tables, 1 algorithm)

This paper contains 16 sections, 1 equation, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: An overview of CFE2. Fig \ref{['fig:overview']} shows a sample workflow of one edit. CFE2 takes a query as input, then [1.A] search model $\mathbf{S}$ generates a SERP; for a document pair $(d,d')$, [1.B] masker $\mathbf{M}$ generates importance score, then [1.C] editor $\mathbf{E}$ performs editing. Fig \ref{['fig:editing']} shows a more detailed editing loop. [2.A] the top-1 important token from query is masked then the masked query is prepended to counterfactual document and input to editor $\mathbf{E}$; [2.B] the editor predicts word and stores the candidates to beam $\mathcal{B}$; then checks flip. If not flip, [2.C] it will mask one more token and run word prediction/decoding again; if flip, then [2.D] editing is complete and editor will output the counterfactual query with lowest perplexity, serving as a counterfactual explanation to initial $(q,d,d')$ triplet.
  • Figure 2: Screenshot of Annotation Task. The last question functions as an attention check.
  • Figure 3: Effect of beam size on FiQA dataset where x-axis denotes beam size. Runtime is measured by #seconds/edit. The line of CosSim overlaps the line of BERTScore-F1.