Convolutional Dynamic Alignment Networks for Interpretable Classifications
Moritz Böhle, Mario Fritz, Bernt Schiele
TL;DR
The paper tackles the interpretability gap in deep networks by introducing Convolutional Dynamic Alignment Networks (CoDA-Nets), built from Dynamic Alignment Units (DAUs) that compute input-dependent linear transforms. This yields a faithful linear decomposition of outputs into input contributions, with weights that align to discriminative patterns, producing high-quality, model-inherent contribution maps. Empirically, CoDA-Nets achieve competitive accuracy on CIFAR-10 and TinyImagenet while offering superior attribution quality compared to standard post-hoc methods, aided by temperature scaling to modulate alignment pressure. The approach promises practical impact by enabling inherently interpretable classifiers with detailed, locality-sensitive explanations without substantial sacrifices in performance, and it points to scalable paths through shared components and efficient implementations in future work.
Abstract
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA-Nets), which are performant classifiers with a high degree of inherent interpretability. Their core building blocks are Dynamic Alignment Units (DAUs), which linearly transform their input with weight vectors that dynamically align with task-relevant patterns. As a result, CoDA-Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions. Given the alignment of the DAUs, the resulting contribution maps align with discriminative input patterns. These model-inherent decompositions are of high visual quality and outperform existing attribution methods under quantitative metrics. Further, CoDA-Nets constitute performant classifiers, achieving on par results to ResNet and VGG models on e.g. CIFAR-10 and TinyImagenet.
