Conceptual Contrastive Edits in Textual and Vision-Language Retrieval
Maria Lymperaiou, Giorgos Stamou
TL;DR
The paper tackles interpretability of textual and vision-language retrieval by post-hoc conceptual contrastive edits in a model-agnostic setting. It formulates substitutions as a minimum-weight bipartite matching on a bipartite graph $(S,T,E)$ with weights $w_{s\rightarrow t}$ and constraints $\sum_{t} x_{s\rightarrow t}=1$, $\sum_{s} x_{s\rightarrow t} \le 1$, solved by the Hungarian algorithm in $O(|S||T|\log|S|)$ time. It introduces an ACE metric, defined as $ACE=\frac{\mathbb{E}[|o - o^*| / o]}{n} \times scale$, to quantify per-word influence on ranking outcomes. Experiments on LR and VL retrieval using Flickr reveal POS-specific effects, invariance patterns, and cross-modal differences, highlighting model biases and the value of controllable, explainable interventions for unimodal and VL retrieval.
Abstract
As deep learning models grow in complexity, achieving model-agnostic interpretability becomes increasingly vital. In this work, we employ post-hoc conceptual contrastive edits to expose noteworthy patterns and biases imprinted in representations of retrieval models. We systematically design optimal and controllable contrastive interventions targeting various parts of speech, and effectively apply them to explain both linguistic and visiolinguistic pre-trained models in a black-box manner. Additionally, we introduce a novel metric to assess the per-word impact of contrastive interventions on model outcomes, providing a comprehensive evaluation of each intervention's effectiveness.
