Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

Shufan Shen; Zhaobo Qi; Junshu Sun; Qingming Huang; Qi Tian; Shuhui Wang

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang

TL;DR

The paper addresses whether pre-trained visual representations can be simultaneously interpretable and highly classifiable. It introduces the Inherent Interpretability Score (IIS), a quantitative measure based on the accuracy retention when predictions are made from interpretable concepts, formalized as $IIS = \int_{s} ARR(s)\, ds$ with $ARR = \frac{\text{Acc}(f, g_{cls}\circ g_s \circ g_{\mathcal{C}}, \mathcal{D})}{\text{Acc}(f, h, \mathcal{D})}$. By constructing four concept libraries (Prototype, Cluster, End2End, Text) and projecting representations into a sparse concept space, the authors investigate the relationship between interpretability and downstream classifiability across architectures, datasets, and training stages, finding a robust positive correlation. They further show that interpretability maximization can improve classifiability, and that interpretable predictions based on high-ARR interpretations can achieve accuracy close to the original representations, reducing training costs. The work provides a practical pathway to jointly enhance interpretability and performance in vision models and offers code for reproducibility.

Abstract

The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. In the evaluation of the representation interpretability with different classifiability, we surprisingly discover that the interpretability and classifiability are positively correlated, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. This observation further supports two benefits to the pre-trained representations. First, the classifiability of representations can be further improved by fine-tuning with interpretability maximization. Second, with the classifiability improvement for the representations, we obtain predictions based on their interpretations with less accuracy degradation. The discovered positive correlation and corresponding applications show that practitioners can unify the improvements in interpretability and classifiability for pre-trained vision models. Codes are available at https://github.com/ssfgunner/IIS.

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

TL;DR

Abstract

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)