AdaCBM: An Adaptive Concept Bottleneck Model for Explainable and Accurate Diagnosis
Townim F. Chowdhury, Vu Minh Hieu Phan, Kewen Liao, Minh-Son To, Yutong Xie, Anton van den Hengel, Johan W. Verjans, Zhibin Liao
TL;DR
Problem: Medical image diagnosis demands transparent reasoning; CLIP-based CBMs offer interpretable concept explanations but struggle with domain transfer due to generic pretraining. Approach: AdaCBM inserts a lightweight adapter between CLIP and CBM to align image embeddings with concept space, while using a fixed concept base and a stationary mask to restrict concept contributions; this yields a linear, interpretable logit for each class without fully retraining the backbone. Contributions: (i) linear reinterpretation of CBMs with an adaptive bridge; (ii) GPT-4–driven, prompt-engineered concept generation for visual attributes and concise phrases; (iii) a concept-utility filter combining Welch's t-test and Pearson correlation to select discriminative, non-redundant concepts. Experiments: Across HAM, BCCD, and DR, AdaCBM achieves accuracy comparable to a linear classifier with the same backbone and outperforms post-CBM fine-tuning, with robust ablations on adapter placement and concept generation. Impact: This framework offers an end-to-end recipe to leverage GPT, CLIP, and CBM in medical diagnostics with strong interpretability and practical training efficiency.
Abstract
The integration of vision-language models such as CLIP and Concept Bottleneck Models (CBMs) offers a promising approach to explaining deep neural network (DNN) decisions using concepts understandable by humans, addressing the black-box concern of DNNs. While CLIP provides both explainability and zero-shot classification capability, its pre-training on generic image and text data may limit its classification accuracy and applicability to medical image diagnostic tasks, creating a transfer learning problem. To maintain explainability and address transfer learning needs, CBM methods commonly design post-processing modules after the bottleneck module. However, this way has been ineffective. This paper takes an unconventional approach by re-examining the CBM framework through the lens of its geometrical representation as a simple linear classification system. The analysis uncovers that post-CBM fine-tuning modules merely rescale and shift the classification outcome of the system, failing to fully leverage the system's learning potential. We introduce an adaptive module strategically positioned between CLIP and CBM to bridge the gap between source and downstream domains. This simple yet effective approach enhances classification performance while preserving the explainability afforded by the framework. Our work offers a comprehensive solution that encompasses the entire process, from concept discovery to model training, providing a holistic recipe for leveraging the strengths of GPT, CLIP, and CBM.
