Optimising for Interpretability: Convolutional Dynamic Alignment Networks
Moritz Böhle, Mario Fritz, Bernt Schiele
TL;DR
CoDA Nets introduce Dynamic Alignment Units that output input-dependent linear transforms $o = \mathbf{w}(\vec{x})^\top \vec{x}$, producing model-inherent contribution maps that align with discriminative input patterns. The framework combines dynamic linearity with an alignment bias, implemented through bounded weight norms and efficient eDAU variants, enabling both high classification performance and faithful explanations. Empirical results show competitive accuracy on CIFAR-10 and TinyImagenet, superior detail in attribution maps compared to post-hoc methods, and the potential to create hybrid networks that increase interpretable depth while leveraging standard CNNs. Temperature scaling and embedding-depth interpolation offer practical knobs to trade off interpretability and accuracy, supporting scalable, interpretable vision models.
Abstract
We introduce a new family of neural network models called Convolutional Dynamic Alignment Networks (CoDA Nets), which are performant classifiers with a high degree of inherent interpretability. Their core building blocks are Dynamic Alignment Units (DAUs), which are optimised to transform their inputs with dynamically computed weight vectors that align with task-relevant patterns. As a result, CoDA Nets model the classification prediction through a series of input-dependent linear transformations, allowing for linear decomposition of the output into individual input contributions. Given the alignment of the DAUs, the resulting contribution maps align with discriminative input patterns. These model-inherent decompositions are of high visual quality and outperform existing attribution methods under quantitative metrics. Further, CoDA Nets constitute performant classifiers, achieving on par results to ResNet and VGG models on e.g. CIFAR-10 and TinyImagenet. Lastly, CoDA Nets can be combined with conventional neural network models to yield powerful classifiers that more easily scale to complex datasets such as Imagenet whilst exhibiting an increased interpretable depth, i.e., the output can be explained well in terms of contributions from intermediate layers within the network.
