Table of Contents
Fetching ...

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

Shufan Shen, Zhaobo Qi, Junshu Sun, Qingming Huang, Qi Tian, Shuhui Wang

TL;DR

The paper addresses whether pre-trained visual representations can be simultaneously interpretable and highly classifiable. It introduces the Inherent Interpretability Score (IIS), a quantitative measure based on the accuracy retention when predictions are made from interpretable concepts, formalized as $IIS = \int_{s} ARR(s)\, ds$ with $ARR = \frac{\text{Acc}(f, g_{cls}\circ g_s \circ g_{\mathcal{C}}, \mathcal{D})}{\text{Acc}(f, h, \mathcal{D})}$. By constructing four concept libraries (Prototype, Cluster, End2End, Text) and projecting representations into a sparse concept space, the authors investigate the relationship between interpretability and downstream classifiability across architectures, datasets, and training stages, finding a robust positive correlation. They further show that interpretability maximization can improve classifiability, and that interpretable predictions based on high-ARR interpretations can achieve accuracy close to the original representations, reducing training costs. The work provides a practical pathway to jointly enhance interpretability and performance in vision models and offers code for reproducibility.

Abstract

The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. In the evaluation of the representation interpretability with different classifiability, we surprisingly discover that the interpretability and classifiability are positively correlated, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. This observation further supports two benefits to the pre-trained representations. First, the classifiability of representations can be further improved by fine-tuning with interpretability maximization. Second, with the classifiability improvement for the representations, we obtain predictions based on their interpretations with less accuracy degradation. The discovered positive correlation and corresponding applications show that practitioners can unify the improvements in interpretability and classifiability for pre-trained vision models. Codes are available at https://github.com/ssfgunner/IIS.

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

TL;DR

The paper addresses whether pre-trained visual representations can be simultaneously interpretable and highly classifiable. It introduces the Inherent Interpretability Score (IIS), a quantitative measure based on the accuracy retention when predictions are made from interpretable concepts, formalized as with . By constructing four concept libraries (Prototype, Cluster, End2End, Text) and projecting representations into a sparse concept space, the authors investigate the relationship between interpretability and downstream classifiability across architectures, datasets, and training stages, finding a robust positive correlation. They further show that interpretability maximization can improve classifiability, and that interpretable predictions based on high-ARR interpretations can achieve accuracy close to the original representations, reducing training costs. The work provides a practical pathway to jointly enhance interpretability and performance in vision models and offers code for reproducibility.

Abstract

The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However, it remains unclear whether the pre-trained representations can achieve high interpretability and classifiability simultaneously. To answer this question, we quantify the representation interpretability by leveraging its correlation with the ratio of interpretable semantics within the representations. Given the pre-trained representations, only the interpretable semantics can be captured by interpretations, whereas the uninterpretable part leads to information loss. Based on this fact, we propose the Inherent Interpretability Score (IIS) that evaluates the information loss, measures the ratio of interpretable semantics, and quantifies the representation interpretability. In the evaluation of the representation interpretability with different classifiability, we surprisingly discover that the interpretability and classifiability are positively correlated, i.e., representations with higher classifiability provide more interpretable semantics that can be captured in the interpretations. This observation further supports two benefits to the pre-trained representations. First, the classifiability of representations can be further improved by fine-tuning with interpretability maximization. Second, with the classifiability improvement for the representations, we obtain predictions based on their interpretations with less accuracy degradation. The discovered positive correlation and corresponding applications show that practitioners can unify the improvements in interpretability and classifiability for pre-trained vision models. Codes are available at https://github.com/ssfgunner/IIS.

Paper Structure

This paper contains 14 sections, 8 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Classifiability-oriented representations have uninterpretable semantics, causing an interpretability reduction. We propose the IIS to quantify representation interpretability with the maintenance of task-relevant semantics in interpretations (left). By comparing the IIS and prediction accuracy of representations from different models, we observe a positive correlation between interpretability and classifiability (right).
  • Figure 2: Definition and computation of the IIS. Given a pre-trained model and a downstream task, we first collect task-relevant concept libraries (left) and interpret model representations by projecting them into the concept space. By interpreting representations with sparse concepts, we can extract their interpretable semantics (middle). The IIS is defined as the representation's ability to retain accuracy when predicting solely based on interpretations (right).
  • Figure 3: The relationship between IIS and the prediction accuracy of pre-trained representations on ImageNet. We provide experiments with four types of concept libraries.
  • Figure 4: The relationship between the accuracy and IIS on three datasets (CUB-200, CIFAR-10, and CIFAR-100). The IIS is computed based on the textual concept library.
  • Figure 5: The evolution of the IIS with varying accuracy (a) and epochs during the pre-training process of representations from ResNet-50 (b) and ViT-B (c) on ImageNet.
  • ...and 4 more figures