Table of Contents
Fetching ...

Information Bottleneck Approach to Spatial Attention Learning

Qiuxia Lai, Yu Li, Ailing Zeng, Minhao Liu, Hanqiu Sun, Qiang Xu

TL;DR

This work addresses how to inject human-like information bottleneck constraints into deep neural network attention. It introduces the AttVIB framework, an IB-inspired spatial attention mechanism that jointly optimizes a variational objective to maximize $I(Z;Y)$ while minimizing $I(Z;X,A)$, and incorporates attention-score quantization to an anchor set $\{v_i\}$ for stronger information control. The approach yields attention maps that focus on informative regions, improves performance across image classification, fine-grained recognition, and cross-domain tasks, and enhances interpretability of model decisions. The authors provide extensive experiments and release code to support reproducibility.

Abstract

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at https://github.com/ashleylqx/AIB.git.

Information Bottleneck Approach to Spatial Attention Learning

TL;DR

This work addresses how to inject human-like information bottleneck constraints into deep neural network attention. It introduces the AttVIB framework, an IB-inspired spatial attention mechanism that jointly optimizes a variational objective to maximize while minimizing , and incorporates attention-score quantization to an anchor set for stronger information control. The approach yields attention maps that focus on informative regions, improves performance across image classification, fine-grained recognition, and cross-domain tasks, and enhances interpretability of model decisions. The authors provide extensive experiments and release code to support reproducibility.

Abstract

The selective visual attention mechanism in the human visual system (HVS) restricts the amount of information to reach visual awareness for perceiving natural scenes, allowing near real-time information processing with limited computational capacity [Koch and Ullman, 1987]. This kind of selectivity acts as an 'Information Bottleneck (IB)', which seeks a trade-off between information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). In this paper, we propose an IB-inspired spatial attention module for DNN structures built for visual recognition. The module takes as input an intermediate representation of the input image, and outputs a variational 2D attention map that minimizes the mutual information (MI) between the attention-modulated representation and the input, while maximizing the MI between the attention-modulated representation and the task label. To further restrict the information bypassed by the attention map, we quantize the continuous attention scores to a set of learnable anchor values during training. Extensive experiments show that the proposed IB-inspired spatial attention mechanism can yield attention maps that neatly highlight the regions of interest while suppressing backgrounds, and bootstrap standard DNN structures for visual recognition tasks (e.g., image classification, fine-grained recognition, cross-domain classification). The attention maps are interpretable for the decision making of the DNNs as verified in the experiments. Our code is available at https://github.com/ashleylqx/AIB.git.

Paper Structure

This paper contains 20 sections, 20 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Graphical model of the probabilistic neural network with IB-inspired spatial attention mechanism (§\ref{['sec:theory']}).
  • Figure 2: Framework of the IB-inspired spatial attention mechanism for visual recognition. The input $x$ is passed through an attention module to produce a continuous variational attention map $a$, which is quantized to a discrete attention map $a_q$ using a set of learnable anchor values $v_i$. Then, $a_q$ and $x$ are encoded to a latent vector $z$, and decoded to a prediction $y$. Loss function in Eq. (\ref{['eq:loss_func']}). See §\ref{['sec:theory']} and §\ref{['sec:quantization']}.
  • Figure 3: Visualization of attention maps for interpretability (§\ref{['sec:interp_analysis']}).
  • Figure 4: Ablation study on CIFAR-10 with VGG backbone (§\ref{['sec:ab_study']}).
  • Figure 5: Effect of attention score quantization. See §\ref{['sec:ab_study']} for details.
  • ...and 1 more figures