IB-AdCSCNet:Adaptive Convolutional Sparse Coding Network Driven by Information Bottleneck
He Zou, Meng'en Qin, Yu Song, Xiaohui Yang
TL;DR
IB-AdCSCNet addresses the challenge of preserving task-relevant information while discarding redundancy by integrating the information bottleneck objective into a convolutional sparse coding framework with adaptive $\lambda$ learned inside the FISTA iterations. The approach replaces standard convolutional layers with Adaptive Convolutional Sparse Coding (IB-AdCSC) layers, enabling a multi-layer IB trade-off to be learned end-to-end and allowing test-time $\lambda$ adaptation for robustness. The authors provide theoretical grounding connecting mutual information objectives to sparse coding, and demonstrate competitive accuracy with residual networks on CIFAR-10/100 while achieving notable improvements under corrupted data. This work advances interpretable, robust feature learning by marrying information bottleneck principles with sparse representation in deep networks.
Abstract
In the realm of neural network models, the perpetual challenge remains in retaining task-relevant information while effectively discarding redundant data during propagation. In this paper, we introduce IB-AdCSCNet, a deep learning model grounded in information bottleneck theory. IB-AdCSCNet seamlessly integrates the information bottleneck trade-off strategy into deep networks by dynamically adjusting the trade-off hyperparameter $λ$ through gradient descent, updating it within the FISTA(Fast Iterative Shrinkage-Thresholding Algorithm ) framework. By optimizing the compressive excitation loss function induced by the information bottleneck principle, IB-AdCSCNet achieves an optimal balance between compression and fitting at a global level, approximating the globally optimal representation feature. This information bottleneck trade-off strategy driven by downstream tasks not only helps to learn effective features of the data, but also improves the generalization of the model. This study's contribution lies in presenting a model with consistent performance and offering a fresh perspective on merging deep learning with sparse representation theory, grounded in the information bottleneck concept. Experimental results on CIFAR-10 and CIFAR-100 datasets demonstrate that IB-AdCSCNet not only matches the performance of deep residual convolutional networks but also outperforms them when handling corrupted data. Through the inference of the IB trade-off, the model's robustness is notably enhanced.
