Table of Contents
Fetching ...

Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts

Jihye Choi, Jayaram Raghuram, Yixuan Li, Somesh Jha

TL;DR

This work tackles the challenge of deploying interpretable CBMs on foundation-model backbones when test-time distribution shifts occur. It introduces CONDA, a three-component, test-time adaptation framework consisting of Concept Score Alignment, Linear Probing Adaptation, and a Residual Concept Bottleneck, designed to operate with unlabeled target data and without access to source data. Through extensive experiments across CIFAR-C, Waterbirds, Metashift, and Camelyon17, CONDA consistently improves target performance and yields concept-based explanations that align with the shifted data, sometimes matching or surpassing non-interpretable baselines. The findings demonstrate the practical viability of interpretable foundation-model pipelines in real-world deployments and point to future work on theoretical guarantees and more robust pseudo-labeling techniques.

Abstract

Advancements in foundation models (FMs) have led to a paradigm shift in machine learning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of a shallow fully-connected network following the representation. However, the non-interpretable, black-box nature of this prediction pipeline can be a challenge, especially in critical domains such as healthcare, finance, and security. In this paper, we explore the potential of Concept Bottleneck Models (CBMs) for transforming complex, non-interpretable foundation models into interpretable decision-making pipelines using high-level concept vectors. Specifically, we focus on the test-time deployment of such an interpretable CBM pipeline "in the wild", where the input distribution often shifts from the original training distribution. We first identify the potential failure modes of such a pipeline under different types of distribution shifts. Then we propose an adaptive concept bottleneck framework to address these failure modes, that dynamically adapts the concept-vector bank and the prediction layer based solely on unlabeled data from the target domain, without access to the source (training) dataset. Empirical evaluations with various real-world distribution shifts show that our adaptation method produces concept-based interpretations better aligned with the test data and boosts post-deployment accuracy by up to 28%, aligning the CBM performance with that of non-interpretable classification.

Adaptive Concept Bottleneck for Foundation Models Under Distribution Shifts

TL;DR

This work tackles the challenge of deploying interpretable CBMs on foundation-model backbones when test-time distribution shifts occur. It introduces CONDA, a three-component, test-time adaptation framework consisting of Concept Score Alignment, Linear Probing Adaptation, and a Residual Concept Bottleneck, designed to operate with unlabeled target data and without access to source data. Through extensive experiments across CIFAR-C, Waterbirds, Metashift, and Camelyon17, CONDA consistently improves target performance and yields concept-based explanations that align with the shifted data, sometimes matching or surpassing non-interpretable baselines. The findings demonstrate the practical viability of interpretable foundation-model pipelines in real-world deployments and point to future work on theoretical guarantees and more robust pseudo-labeling techniques.

Abstract

Advancements in foundation models (FMs) have led to a paradigm shift in machine learning. The rich, expressive feature representations from these pre-trained, large-scale FMs are leveraged for multiple downstream tasks, usually via lightweight fine-tuning of a shallow fully-connected network following the representation. However, the non-interpretable, black-box nature of this prediction pipeline can be a challenge, especially in critical domains such as healthcare, finance, and security. In this paper, we explore the potential of Concept Bottleneck Models (CBMs) for transforming complex, non-interpretable foundation models into interpretable decision-making pipelines using high-level concept vectors. Specifically, we focus on the test-time deployment of such an interpretable CBM pipeline "in the wild", where the input distribution often shifts from the original training distribution. We first identify the potential failure modes of such a pipeline under different types of distribution shifts. Then we propose an adaptive concept bottleneck framework to address these failure modes, that dynamically adapts the concept-vector bank and the prediction layer based solely on unlabeled data from the target domain, without access to the source (training) dataset. Empirical evaluations with various real-world distribution shifts show that our adaptation method produces concept-based interpretations better aligned with the test data and boosts post-deployment accuracy by up to 28%, aligning the CBM performance with that of non-interpretable classification.

Paper Structure

This paper contains 28 sections, 15 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: Concept-based predictions are not inherently more robust to distribution shifts than feature-based predictions, necessitating dynamic adaptation after deployment. We observe significant drops in the averaged group accuracy (AVG) and worst-group accuracy (WG) from the source to the target (test) domain under two types of distribution shifts: (1) low-level shift (left), where inputs are perturbed without modifying class-level semantics ( e.g., Gaussian noise); and (2) concept-level shift (right), where some high-level semantics change. On the left, predictions made through high-level concepts ( e.g., by PCBM yuksekgonul2023posthoc here) are not necessarily more robust to low-level input perturbations. On the right, the performance of concept-based predictions suffers an even more drastic drop, failing to leverage the expressiveness of the foundation model's high-level features, and falling behind direct feature-based predictions (here zero-shot and linear-probing based classification). However, with CONDA (our method), we can boost the performance of the deployed concept-based predictor to be on par with, or even better than, its non-interpretable counterparts.
  • Figure 2: Overview of CONDA, our proposed adaptation framework. The foundation model and CBM pipeline trained on the source domain is shown at the top, while the adapted CBM, consisting of a main branch and residual branch, is shown at the bottom. The components of CBM that are adapted during each stage of the proposed method ( i.e., CSA, LPA, and RCB) are shown in different colors.
  • Figure 3: Effectiveness of individual components of CONDA for the CBM method of yuksekgonul2023posthoc. We report the relative AVG and WG, which is the (acc. after adaptation) $-$ (acc. before adaptation).
  • Figure 4: CONDA adapts the concept weights to be tailored to the target data. We visualize the linear probing layer weights (width of each mapping) before vs. after applying CONDA to the PCBM baseline yuksekgonul2023posthoc on the Waterbirds dataset. We only show the mappings with positive weights.
  • Figure 5: Ablations on the hyper-parameters in CONDA. We ablate on the individual hyper-parameters in CONDA for each type of distribution shift: (1) CIFAR10-C (impulse noise) simulating low-level shift, and (2) Waterbirds simulating concept-level shift.
  • ...and 3 more figures

Theorems & Definitions (1)

  • Definition 1