Table of Contents
Fetching ...

eXplainable Bayesian Multi-Perspective Generative Retrieval

EuiYul Song, Philhoon Oh, Sangryul Kim, James Thorne

TL;DR

This work tackles the lack of interpretability and unreliable uncertainty estimates in deterministic retrieval by integrating uncertainty calibration and explainability into the retrieval pipeline. It combines Bayesian deep learning techniques (e.g., Deep Ensemble, SWA, MC Dropout) with an explainable context reranker (LIME/SHAP) and introduces uncertainty-aware fusion in the decoder alongside multi-perspective retrieval that fuses GENRE and Re3val contexts. Key contributions include a Bayesian Context Reranker, an eXplainable Context Reranker, and a stochastic FiD pre-training approach with Jensen-Shannon Divergence, all shown to improve downstream reader accuracy on three KILT datasets without substantial training overhead. The results demonstrate improved robustness and grounding quality, enabling more reliable, interpretable, and cost-efficient knowledge-grounded language systems.

Abstract

Modern deterministic retrieval pipelines prioritize achieving state-of-the-art performance but often lack interpretability in decision-making. These models face challenges in assessing uncertainty, leading to overconfident predictions. To overcome these limitations, we integrate uncertainty calibration and interpretability into a retrieval pipeline. Specifically, we introduce Bayesian methodologies and multi-perspective retrieval to calibrate uncertainty within a retrieval pipeline. We incorporate techniques such as LIME and SHAP to analyze the behavior of a black-box reranker model. The importance scores derived from these explanation methodologies serve as supplementary relevance scores to enhance the base reranker model. We evaluate the resulting performance enhancements achieved through uncertainty calibration and interpretable reranking on Question Answering and Fact Checking tasks. Our methods demonstrate substantial performance improvements across three KILT datasets.

eXplainable Bayesian Multi-Perspective Generative Retrieval

TL;DR

This work tackles the lack of interpretability and unreliable uncertainty estimates in deterministic retrieval by integrating uncertainty calibration and explainability into the retrieval pipeline. It combines Bayesian deep learning techniques (e.g., Deep Ensemble, SWA, MC Dropout) with an explainable context reranker (LIME/SHAP) and introduces uncertainty-aware fusion in the decoder alongside multi-perspective retrieval that fuses GENRE and Re3val contexts. Key contributions include a Bayesian Context Reranker, an eXplainable Context Reranker, and a stochastic FiD pre-training approach with Jensen-Shannon Divergence, all shown to improve downstream reader accuracy on three KILT datasets without substantial training overhead. The results demonstrate improved robustness and grounding quality, enabling more reliable, interpretable, and cost-efficient knowledge-grounded language systems.

Abstract

Modern deterministic retrieval pipelines prioritize achieving state-of-the-art performance but often lack interpretability in decision-making. These models face challenges in assessing uncertainty, leading to overconfident predictions. To overcome these limitations, we integrate uncertainty calibration and interpretability into a retrieval pipeline. Specifically, we introduce Bayesian methodologies and multi-perspective retrieval to calibrate uncertainty within a retrieval pipeline. We incorporate techniques such as LIME and SHAP to analyze the behavior of a black-box reranker model. The importance scores derived from these explanation methodologies serve as supplementary relevance scores to enhance the base reranker model. We evaluate the resulting performance enhancements achieved through uncertainty calibration and interpretable reranking on Question Answering and Fact Checking tasks. Our methods demonstrate substantial performance improvements across three KILT datasets.
Paper Structure (37 sections, 7 equations, 7 figures, 6 tables)

This paper contains 37 sections, 7 equations, 7 figures, 6 tables.

Figures (7)

  • Figure 1: We add uncertainty calibration and explainability on the black box reranker system. We find that performance improves simply by applying two modules without a significant increase in inference latency.
  • Figure 2: LIME visualization of the binary context reranker. Among the analyzed features, Chainsmokers appears to be the second most positive feature following SEP.
  • Figure 3: SHAP visualization for the binary context reranker reveals that the feature Chains has the most significant impact on determining the context as relevant.
  • Figure 4: Average attention score on the first layer.
  • Figure 5: Averaged attention score on the last layer.
  • ...and 2 more figures