Table of Contents
Fetching ...

Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval

Vaibhav Balloli, Sara Beery, Elizabeth Bondi-Kelly

TL;DR

This work tackles the challenge of reliable image retrieval in high-stakes domains by enabling human-AI collaboration through a Concept Bottleneck Model (CBM) augmentation. It introduces CHAIR, a two-stage architecture with a Fusion Head that allows concept corrections to edit embeddings, supporting varying levels of human expertise during retrieval while preserving classification performance. Empirical results on CUB and CelebA show CHAIR yields substantial retrieval gains (15–20% in Recall@$k$) and benefits from staged interventions, with Stage 2 improving performance under partial corrections and t-SNE visualizations confirming improved embedding quality. The approach broadens the applicability of CBMs beyond classification, offering practical, interpretable, and intervention-friendly retrieval in domains like wildlife monitoring and healthcare.

Abstract

Image retrieval plays a pivotal role in applications from wildlife conservation to healthcare, for finding individual animals or relevant images to aid diagnosis. Although deep learning techniques for image retrieval have advanced significantly, their imperfect real-world performance often necessitates including human expertise. Human-in-the-loop approaches typically rely on humans completing the task independently and then combining their opinions with an AI model in various ways, as these models offer very little interpretability or \textit{correctability}. To allow humans to intervene in the AI model instead, thereby saving human time and effort, we adapt the Concept Bottleneck Model (CBM) and propose \texttt{CHAIR}. \texttt{CHAIR} (a) enables humans to correct intermediate concepts, which helps \textit{improve} embeddings generated, and (b) allows for flexible levels of intervention that accommodate varying levels of human expertise for better retrieval. To show the efficacy of \texttt{CHAIR}, we demonstrate that our method performs better than similar models on image retrieval metrics without any external intervention. Furthermore, we also showcase how human intervention helps further improve retrieval performance, thereby achieving human-AI complementarity.

Are They the Same Picture? Adapting Concept Bottleneck Models for Human-AI Collaboration in Image Retrieval

TL;DR

This work tackles the challenge of reliable image retrieval in high-stakes domains by enabling human-AI collaboration through a Concept Bottleneck Model (CBM) augmentation. It introduces CHAIR, a two-stage architecture with a Fusion Head that allows concept corrections to edit embeddings, supporting varying levels of human expertise during retrieval while preserving classification performance. Empirical results on CUB and CelebA show CHAIR yields substantial retrieval gains (15–20% in Recall@) and benefits from staged interventions, with Stage 2 improving performance under partial corrections and t-SNE visualizations confirming improved embedding quality. The approach broadens the applicability of CBMs beyond classification, offering practical, interpretable, and intervention-friendly retrieval in domains like wildlife monitoring and healthcare.

Abstract

Image retrieval plays a pivotal role in applications from wildlife conservation to healthcare, for finding individual animals or relevant images to aid diagnosis. Although deep learning techniques for image retrieval have advanced significantly, their imperfect real-world performance often necessitates including human expertise. Human-in-the-loop approaches typically rely on humans completing the task independently and then combining their opinions with an AI model in various ways, as these models offer very little interpretability or \textit{correctability}. To allow humans to intervene in the AI model instead, thereby saving human time and effort, we adapt the Concept Bottleneck Model (CBM) and propose \texttt{CHAIR}. \texttt{CHAIR} (a) enables humans to correct intermediate concepts, which helps \textit{improve} embeddings generated, and (b) allows for flexible levels of intervention that accommodate varying levels of human expertise for better retrieval. To show the efficacy of \texttt{CHAIR}, we demonstrate that our method performs better than similar models on image retrieval metrics without any external intervention. Furthermore, we also showcase how human intervention helps further improve retrieval performance, thereby achieving human-AI complementarity.
Paper Structure (19 sections, 3 equations, 10 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 3 equations, 10 figures, 1 table, 2 algorithms.

Figures (10)

  • Figure 1: Our proposed method allows human-AI collaboration in image retrieval by enabling humans to edit embeddings by correcting them through high-level concepts. We also enable flexible intervention to lower the expertise needed to participate.
  • Figure 2: Illustration of the CBM and CBM-Extend, a naive extension of CBM to correct embeddings for retrieval (Edited here refers to capturing human intervention)
  • Figure 3: CBM koh2020concept, HybridCBM mahinpei2021promises and Naive CBM extension (Figure \ref{['fig:CBMExtend']}) have poor retrieval performance when compared to their standard counterpart model
  • Figure 4: High-level overview of FusionCBM. Our proposed CBM architecture enables editing the embeddings using concepts to enable learning better representations and thus improving image retrieval.
  • Figure 5: Comparison of the baseline retrieval performance (without any intervention, if possible) of the standard ResNet model, vanilla CBM, and the proposed CHAIR model.
  • ...and 5 more figures