Interpretable Vision Transformers in Image Classification via SVDA
Vasileios Arampatzakis, George Pavlidis, Nikolaos Mitianoudis, Nikos Papamarkos
TL;DR
The paper tackles the interpretability gap in Vision Transformers by introducing SVDA, a spectral-value decomposed attention mechanism that decouples directional information from spectral importance through soft-orthonormal projections and a learned diagonal matrix $\Sigma$ in the attention computation $A \sim Q \Sigma K^\top$. By integrating SVDA into ViTs while preserving architectural compatibility, the authors demonstrate that attention becomes more structured and sparse without sacrificing accuracy across CIFAR-10, CIFAR-100, FashionMNIST, and ImageNet-100. They also introduce six interpretability indicators—spectral entropy, effective rank, spectral sparsity, angular alignment, selectivity index, and perturbation robustness—to diagnose attention dynamics at the head and layer level, tracked throughout training. The results show SVDA maintains competitive performance while providing richer, geometry-grounded interpretability and spectral diagnostics, establishing a foundation for explainable AI, spectral-based diagnostics, and potential attention-regularization strategies in vision models.
Abstract
Vision Transformers (ViTs) have achieved state-of-the-art performance in image classification, yet their attention mechanisms often remain opaque and exhibit dense, non-structured behaviors. In this work, we adapt our previously proposed SVD-Inspired Attention (SVDA) mechanism to the ViT architecture, introducing a geometrically grounded formulation that enhances interpretability, sparsity, and spectral structure. We apply the use of interpretability indicators -- originally proposed with SVDA -- to monitor attention dynamics during training and assess structural properties of the learned representations. Experimental evaluations on four widely used benchmarks -- CIFAR-10, FashionMNIST, CIFAR-100, and ImageNet-100 -- demonstrate that SVDA consistently yields more interpretable attention patterns without sacrificing classification accuracy. While the current framework offers descriptive insights rather than prescriptive guidance, our results establish SVDA as a comprehensive and informative tool for analyzing and developing structured attention models in computer vision. This work lays the foundation for future advances in explainable AI, spectral diagnostics, and attention-based model compression.
