Table of Contents
Fetching ...

Learning to Compress: Local Rank and Information Compression in Deep Neural Networks

Niket Patel, Ravid Shwartz-Ziv

TL;DR

This paper investigates how deep multilayer perceptrons (MLPs) encode these feature manifolds and connects this behavior to the Information Bottleneck theory, and introduces the concept of local rank as a measure of feature manifold dimensionality.

Abstract

Deep neural networks tend to exhibit a bias toward low-rank solutions during training, implicitly learning low-dimensional feature representations. This paper investigates how deep multilayer perceptrons (MLPs) encode these feature manifolds and connects this behavior to the Information Bottleneck (IB) theory. We introduce the concept of local rank as a measure of feature manifold dimensionality and demonstrate, both theoretically and empirically, that this rank decreases during the final phase of training. We argue that networks that reduce the rank of their learned representations also compress mutual information between inputs and intermediate layers. This work bridges the gap between feature manifold rank and information compression, offering new insights into the interplay between information bottlenecks and representation learning.

Learning to Compress: Local Rank and Information Compression in Deep Neural Networks

TL;DR

This paper investigates how deep multilayer perceptrons (MLPs) encode these feature manifolds and connects this behavior to the Information Bottleneck theory, and introduces the concept of local rank as a measure of feature manifold dimensionality.

Abstract

Deep neural networks tend to exhibit a bias toward low-rank solutions during training, implicitly learning low-dimensional feature representations. This paper investigates how deep multilayer perceptrons (MLPs) encode these feature manifolds and connects this behavior to the Information Bottleneck (IB) theory. We introduce the concept of local rank as a measure of feature manifold dimensionality and demonstrate, both theoretically and empirically, that this rank decreases during the final phase of training. We argue that networks that reduce the rank of their learned representations also compress mutual information between inputs and intermediate layers. This work bridges the gap between feature manifold rank and information compression, offering new insights into the interplay between information bottlenecks and representation learning.

Paper Structure

This paper contains 25 sections, 8 theorems, 37 equations, 3 figures.

Key Result

Proposition 2

(Informal) Let $\mathcal{D} = \{(x_i, y_i)\}_{i=1}^n \subseteq \mathbb{R}^{n_0} \times \{-1, 1\}$ be a binary classification dataset. Assume there exists a fully connected neural network with weight matrices uniformly bounded by $B$ that correctly classifies every data point in $\mathcal{D}$ with ma where $\|W_l\|_\sigma$ denotes the operator norm of $W_l$.

Figures (3)

  • Figure 1: Reduction in Local Rank During Training.Left: A 3-layer MLP trained on synthetic Gaussian data. Right: A 4-layer MLP trained on MNIST. In both cases, the local rank decreases during the terminal phase of training, indicating compression of the feature manifold across all layers.
  • Figure 2: Empirical Results on Gaussian Data using Deep-VIB.Left: KL divergence component of the loss versus $\beta$, with points colored by empirical local rank corresponding to critical $\beta$ values. Right: Local rank as a function of $\beta$, showing an increase with $\beta$ and distinct phase transitions. We provide more information in Appendix \ref{['sec:Gauss_DEEPCIB']}.
  • Figure 3: Empirical Results on MNIST and Fashion-MNIST. Left: MNIST dataset. Right: Fashion-MNIST dataset. As we increase $\beta$, the local rank increases, and accuracy decreases, indicating that higher $\beta$ values correspond to less compressed representations and lower performance.

Theorems & Definitions (14)

  • Definition 1
  • Proposition 2
  • proof
  • Proposition 3
  • proof
  • Theorem 4
  • Lemma 5
  • proof
  • Theorem 6
  • Proposition 7
  • ...and 4 more