Table of Contents
Fetching ...

Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior

Milad Sefidgaran, Abdellatif Zaidi, Piotr Krasnowski

TL;DR

This work develops a principled MDL-based framework for generalization guarantees in distributed multi-view representation learning. It derives in-expectation, tail, and lossy generalization bounds expressed via the MDL of latent representations under symmetric priors, guiding regularizer design. The authors introduce data-dependent Gaussian mixture priors and two regimes (lossless and lossy) to regularize encoders, with a further extension to Gaussians-product priors for multi-view settings that can exploit or induce redundancy across views. Empirical results show that Gaussian-product MDL regularizers improve single-view VIB/CDVIB baselines and outperform no-regularization in multi-view scenarios, while revealing an attention-like mechanism that governs prior component weighting. Overall, the framework provides scalable, distributed regularization insights that link encoder structure, redundancy, and generalization performance in multi-view systems.

Abstract

We study the problem of distributed multi-view representation learning. In this problem, $K$ agents observe each one distinct, possibly statistically correlated, view and independently extracts from it a suitable representation in a manner that a decoder that gets all $K$ representations estimates correctly the hidden label. In the absence of any explicit coordination between the agents, a central question is: what should each agent extract from its view that is necessary and sufficient for a correct estimation at the decoder? In this paper, we investigate this question from a generalization error perspective. First, we establish several generalization bounds in terms of the relative entropy between the distribution of the representations extracted from training and "test" datasets and a data-dependent symmetric prior, i.e., the Minimum Description Length (MDL) of the latent variables for all views and training and test datasets. Then, we use the obtained bounds to devise a regularizer; and investigate in depth the question of the selection of a suitable prior. In particular, we show and conduct experiments that illustrate that our data-dependent Gaussian mixture priors with judiciously chosen weights lead to good performance. For single-view settings (i.e., $K=1$), our experimental results are shown to outperform existing prior art Variational Information Bottleneck (VIB) and Category-Dependent VIB (CDVIB) approaches. Interestingly, we show that a weighted attention mechanism emerges naturally in this setting. Finally, for the multi-view setting, we show that the selection of the joint prior as a Gaussians product mixture induces a Gaussian mixture marginal prior for each marginal view and implicitly encourages the agents to extract and output redundant features, a finding which is somewhat counter-intuitive.

Generalization Guarantees for Multi-View Representation Learning and Application to Regularization via Gaussian Product Mixture Prior

TL;DR

This work develops a principled MDL-based framework for generalization guarantees in distributed multi-view representation learning. It derives in-expectation, tail, and lossy generalization bounds expressed via the MDL of latent representations under symmetric priors, guiding regularizer design. The authors introduce data-dependent Gaussian mixture priors and two regimes (lossless and lossy) to regularize encoders, with a further extension to Gaussians-product priors for multi-view settings that can exploit or induce redundancy across views. Empirical results show that Gaussian-product MDL regularizers improve single-view VIB/CDVIB baselines and outperform no-regularization in multi-view scenarios, while revealing an attention-like mechanism that governs prior component weighting. Overall, the framework provides scalable, distributed regularization insights that link encoder structure, redundancy, and generalization performance in multi-view systems.

Abstract

We study the problem of distributed multi-view representation learning. In this problem, agents observe each one distinct, possibly statistically correlated, view and independently extracts from it a suitable representation in a manner that a decoder that gets all representations estimates correctly the hidden label. In the absence of any explicit coordination between the agents, a central question is: what should each agent extract from its view that is necessary and sufficient for a correct estimation at the decoder? In this paper, we investigate this question from a generalization error perspective. First, we establish several generalization bounds in terms of the relative entropy between the distribution of the representations extracted from training and "test" datasets and a data-dependent symmetric prior, i.e., the Minimum Description Length (MDL) of the latent variables for all views and training and test datasets. Then, we use the obtained bounds to devise a regularizer; and investigate in depth the question of the selection of a suitable prior. In particular, we show and conduct experiments that illustrate that our data-dependent Gaussian mixture priors with judiciously chosen weights lead to good performance. For single-view settings (i.e., ), our experimental results are shown to outperform existing prior art Variational Information Bottleneck (VIB) and Category-Dependent VIB (CDVIB) approaches. Interestingly, we show that a weighted attention mechanism emerges naturally in this setting. Finally, for the multi-view setting, we show that the selection of the joint prior as a Gaussians product mixture induces a Gaussian mixture marginal prior for each marginal view and implicitly encourages the agents to extract and output redundant features, a finding which is somewhat counter-intuitive.

Paper Structure

This paper contains 32 sections, 5 theorems, 124 equations, 3 figures, 4 tables.

Key Result

Theorem 2

Consider a $C$-class $K$-view classification problem and a learning algorithm $\mathcal{A}\colon \mathcal{Z}^n\to \mathcal{W}$ that induces the joint distribution $(S,S',\mathbf{U},\mathbf{U'},W) \sim P_{S'} P_{S,W} P_{\mathbf{U}|\mathbf{X},W_e}P_{\mathbf{U}'|\mathbf{X}',W_e}$. Then, for any symmet

Figures (3)

  • Figure 1: Distributed multi-view representation learning setup.
  • Figure 2: Values of $h_C\left( \mathcal{\hat{L}}(\mathbf{y},\mathbf{ \hat{y}}),\mathcal{\hat{L}}(\mathbf{y}',\mathbf{ \hat{y}}') ;\epsilon\right)$ as function of the generalization error for the CIFAR10 dataset.
  • Figure 3: Comparison of the generalization bounds of Theorem \ref{['th:generalizationExp_hd']} and Theorem \ref{['th:generalizationExp_old']} for the CIFAR10 dataset.

Theorems & Definitions (6)

  • Definition 1: Symmetric prior
  • Theorem 2: sefidgaran2023minimum
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Lemma 6