Information Bottleneck Approach to Spatial Attention Learning
Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu
TL;DR
This work addresses how to inject human-like information bottleneck constraints into deep neural network attention. It introduces the AttVIB framework, an IB-inspired spatial attention mechanism that jointly optimizes a variational objective to maximize $I(Z;Y)$ while minimizing $I(Z;X,A)$, and incorporates attention-score quantization to an anchor set $\{v_i\}$ for stronger information control. The approach yields attention maps that focus on informative regions, improves performance across image classification, fine-grained recognition, and cross-domain tasks, and enhances interpretability of model decisions. The authors provide extensive experiments and release code to support reproducibility.
Abstract
The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at https://github.com/ashleylqx/AIB.git.
