An Axiomatic Approach to Model-Agnostic Concept Explanations
Zhili Feng, Michal Moshkovitz, Dotan Di Castro, J. Zico Kolter
TL;DR
This work tackles the lack of model-agnostic concept explanations by introducing an axiomatic framework with three core principles: linearity with respect to examples, recursivity, and similarity. From these axioms, the authors derive a family of measures for concept influence, including symmetric, class-conditioned (necessity), and concept-conditioned (sufficiency) forms, and provide an efficient estimation algorithm. They connect their framework to prior methods like TCAV and completeness-aware explanations, showing how TCAV corresponds to necessity and completeness to sufficiency, while enabling faster, model-agnostic computation. Through experiments on tasks like model and optimizer selection, as well as prompt editing for CLIP-based vision-language models, the approach demonstrates practical utility and interpretability, including automatic concept labeling. Overall, the method offers a principled, scalable path to understanding and improving black-box models via interpretable concepts without requiring access to internal model details.
Abstract
Concept explanation is a popular approach for examining how human-interpretable concepts impact the predictions of a model. However, most existing methods for concept explanations are tailored to specific models. To address this issue, this paper focuses on model-agnostic measures. Specifically, we propose an approach to concept explanations that satisfy three natural axioms: linearity, recursivity, and similarity. We then establish connections with previous concept explanation methods, offering insight into their varying semantic meanings. Experimentally, we demonstrate the utility of the new method by applying it in different scenarios: for model selection, optimizer selection, and model improvement using a kind of prompt editing for zero-shot vision language models.
