Table of Contents
Fetching ...

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

Shreyash Arya, Sukrut Rao, Moritz Böhle, Bernt Schiele

TL;DR

This work tackles the high cost of obtaining inherently interpretable neural networks by introducing B-cosification, a method to fine-tune pre-trained models into B-cos DNNs that preserve accuracy while yielding faithful, human-aligned explanations. The authors provide a concrete conversion pipeline, including handling 6-channel inputs, removing biases, and selecting the alignment parameter $B$, and demonstrate strong results across CNNs, ViTs, and CLIP with notable training-time savings. They show that B-cosified models achieve competitive or superior accuracy and significantly improved interpretability as measured by GridPG, with up to 9x speedups in some cases. Importantly, B-cosified CLIP maintains strong zero-shot performance while delivering interpretable and model-faithful explanations, suggesting broad applicability to foundation models with lower resource requirements.

Abstract

B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and vision transformers (ViTs), which primarily replace linear layers with B-cos transformations, perform competitively to their respective standard variants while also yielding explanations that are faithful by design. However, it has so far been necessary to train these models from scratch, which is increasingly infeasible in the era of large, pre-trained foundation models. In this work, inspired by the architectural similarities in standard DNNs and B-cos networks, we propose 'B-cosification', a novel approach to transform existing pre-trained models to become inherently interpretable. We perform a thorough study of design choices to perform this conversion, both for convolutional neural networks and vision transformers. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability, while often outperforming them in terms of classification performance at a fraction of the training cost. Subsequently, we apply B-cosification to a pretrained CLIP model, and show that, even with limited data and compute cost, we obtain a B-cosified version that is highly interpretable and competitive on zero shot performance across a variety of datasets. We release our code and pre-trained model weights at https://github.com/shrebox/B-cosification.

B-cosification: Transforming Deep Neural Networks to be Inherently Interpretable

TL;DR

This work tackles the high cost of obtaining inherently interpretable neural networks by introducing B-cosification, a method to fine-tune pre-trained models into B-cos DNNs that preserve accuracy while yielding faithful, human-aligned explanations. The authors provide a concrete conversion pipeline, including handling 6-channel inputs, removing biases, and selecting the alignment parameter , and demonstrate strong results across CNNs, ViTs, and CLIP with notable training-time savings. They show that B-cosified models achieve competitive or superior accuracy and significantly improved interpretability as measured by GridPG, with up to 9x speedups in some cases. Importantly, B-cosified CLIP maintains strong zero-shot performance while delivering interpretable and model-faithful explanations, suggesting broad applicability to foundation models with lower resource requirements.

Abstract

B-cos Networks have been shown to be effective for obtaining highly human interpretable explanations of model decisions by architecturally enforcing stronger alignment between inputs and weight. B-cos variants of convolutional networks (CNNs) and vision transformers (ViTs), which primarily replace linear layers with B-cos transformations, perform competitively to their respective standard variants while also yielding explanations that are faithful by design. However, it has so far been necessary to train these models from scratch, which is increasingly infeasible in the era of large, pre-trained foundation models. In this work, inspired by the architectural similarities in standard DNNs and B-cos networks, we propose 'B-cosification', a novel approach to transform existing pre-trained models to become inherently interpretable. We perform a thorough study of design choices to perform this conversion, both for convolutional neural networks and vision transformers. We find that B-cosification can yield models that are on par with B-cos models trained from scratch in terms of interpretability, while often outperforming them in terms of classification performance at a fraction of the training cost. Subsequently, we apply B-cosification to a pretrained CLIP model, and show that, even with limited data and compute cost, we obtain a B-cosified version that is highly interpretable and competitive on zero shot performance across a variety of datasets. We release our code and pre-trained model weights at https://github.com/shrebox/B-cosification.

Paper Structure

This paper contains 24 sections, 4 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: B-cosification: Obtaining inherently interpretable models with competitive accuracy at low cost.Left: Accuracy progression over epochs for a DenseNet-121 and a ViT-S, comparing B-cosified (blue) and B-cos (orange) training curves. B-cosified models achieve equivalent accuracy with a substantial reduction in training time, yielding 4.7x speedup for DenseNet-121 and 9.0x speedup for ViT-S. Right: Qualitative comparison of explanations for various images for B-cos boehle2024bcos and our B-cosified models at various stages of training. Specifically, we show the dynamic linear mappings $\mathbf{W}(\mathbf x)$ computed by the models in color as in boehle2022bcosboehle2024bcos; note that by formulating conventional models ('initial' in the plot) as a specific version of B-cos models, we are able to visualise the corresponding explanations in color too, see \ref{['subsubsec:func_equiv']} for further details. We find that after only one epoch of training, the B-cosified models exhibit similar explanations as B-cos models.
  • Figure 2: B-cosified CLIP Models. After B-cosifying a CLIP model and fine-tuning it according to our proposed B-cosification scheme, see \ref{['sec:standardvsbcos:diff']}, we find that it is possible to endow the model with the same level of inherent interpretability as the B-cos models proposed in boehle2024bcos, whilst maintaining CLIP's zeroshot ability (see \ref{['fig:clip_zeroshot_main']}). The resulting linear summaries of the models ($\mathbf{W}(\mathbf{x})$) can be visualised in color (row 3) and provide significantly more detail than GradCAM explanations (row 2), which are often used to explain conventional CLIP models.
  • Figure 3: Localisation Performance of $\mathbf{W}(\mathbf{x})\mathbf{x}$. We compute the contribution maps according to the dynamic linear summaries $\mathbf{W}(\mathbf{x})$ of the pre-trained models ('Standard'), their B-cosified versions, and the original pre-trained B-cos models and evaluate their localisation performance on the Grid Pointing Game as in boehle2024bcos. We find localisation to significantly improve for B-cosified models, achieving results on par with the models of boehle2024bcos.
  • Figure 4: Comparison to Post-hoc Methods. For two of the models in \ref{['fig:comp_across_models']} (ResNet-50-v1, DenseNet-121) we compare the localisation performance of the dynamic matrices $\mathbf{W}(\mathbf{x})\mathbf{x}$ to post-hoc explanations for the pre-trained models. Similar to the original B-cos models boehle2024bcos, the model-inherent explanations perform favourably.
  • Figure 5: Classification performance on the CLIP Benchmarkclipbench2024 of various CLIP models for the zero-shot setting (left) and linear probing (right). Specifically, we compare two B-cosified CLIPs---trained on ImageNet (IMN) and CC3M respectively---to the Text2Concept approach by moayeri2023text and the original pre-trained CLIP model. We find B-cosified versions of CLIP to consistently outperform Text2Concept on natural and specialised data.
  • ...and 4 more figures