Structured IB: Improving Information Bottleneck with Structured Feature Learning
Hanzhe Yang, Youlong Wu, Dingzhu Wen, Yong Zhou, Yuanming Shi
TL;DR
This work addresses the challenge of extracting maximally informative representations under compression in Information Bottleneck (IB) learning by introducing Structured IB (SIB), which augments a primary encoder with multiple auxiliary encoders to capture complementary information. The aggregated representation $\hat{Z}=w_0 Z+\sum_{i=1}^K w_i Z_i$ is trained in a three-stage process that combines the IB Lagrangian objective with a discriminator-based independence constraint to ensure feature distinctiveness. Empirically, SIB variants yield higher $I(Z,Y)$ for a fixed $I(X,Z)$ and achieve improved accuracy with smaller networks on MNIST and CIFAR-10, while revealing insights about the contribution of auxiliary features and weight dynamics. The framework offers a principled, parameter-efficient approach to enhancing IB-based learning and can be extended to other IB formulations and more expressive feature aggregations.
Abstract
The Information Bottleneck (IB) principle has emerged as a promising approach for enhancing the generalization, robustness, and interpretability of deep neural networks, demonstrating efficacy across image segmentation, document clustering, and semantic communication. Among IB implementations, the IB Lagrangian method, employing Lagrangian multipliers, is widely adopted. While numerous methods for the optimizations of IB Lagrangian based on variational bounds and neural estimators are feasible, their performance is highly dependent on the quality of their design, which is inherently prone to errors. To address this limitation, we introduce Structured IB, a framework for investigating potential structured features. By incorporating auxiliary encoders to extract missing informative features, we generate more informative representations. Our experiments demonstrate superior prediction accuracy and task-relevant information preservation compared to the original IB Lagrangian method, even with reduced network size.
