Table of Contents
Fetching ...

Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds

Mehdi Hennequin, Abdelkrim Zitouni, Khalid Benabdeslem, Haytham Elghazel, Yacine Gaci

TL;DR

The paper advances multi-view learning by deriving in-probability PAC-Bayesian bounds based on Rényi divergence for a hierarchical view-voter framework, enabling view-specific regularization through per-view α_v and a hyper-prior/hyper-posterior. It extends both first- and second-order oracle bounds and C-Bounds to multi-view settings, and introduces self-bounding optimization algorithms that directly minimize these bounds in practice. The approach yields tighter, high-probability generalization guarantees and supports learning from unlabeled data via disagreement terms, with empirical results showing strong performance on diverse datasets. Overall, the work bridges theory and practice in multi-view PAC-Bayes, providing a flexible, scalable framework for robust multi-view ensemble learning.

Abstract

The PAC-Bayesian framework has significantly advanced the understanding of statistical learning, particularly for majority voting methods. Despite its successes, its application to multi-view learning -- a setting with multiple complementary data representations -- remains underexplored. In this work, we extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based on Rényi divergence. These bounds provide an alternative to traditional Kullback-Leibler divergence-based counterparts, leveraging the flexibility of Rényi divergence. Furthermore, we propose first- and second-order oracle PAC-Bayesian bounds and extend the C-bound to multi-view settings. To bridge theory and practice, we design efficient self-bounding optimization algorithms that align with our theoretical results.

Multi-View Majority Vote Learning Algorithms: Direct Minimization of PAC-Bayesian Bounds

TL;DR

The paper advances multi-view learning by deriving in-probability PAC-Bayesian bounds based on Rényi divergence for a hierarchical view-voter framework, enabling view-specific regularization through per-view α_v and a hyper-prior/hyper-posterior. It extends both first- and second-order oracle bounds and C-Bounds to multi-view settings, and introduces self-bounding optimization algorithms that directly minimize these bounds in practice. The approach yields tighter, high-probability generalization guarantees and supports learning from unlabeled data via disagreement terms, with empirical results showing strong performance on diverse datasets. Overall, the work bridges theory and practice in multi-view PAC-Bayes, providing a flexible, scalable framework for robust multi-view ensemble learning.

Abstract

The PAC-Bayesian framework has significantly advanced the understanding of statistical learning, particularly for majority voting methods. Despite its successes, its application to multi-view learning -- a setting with multiple complementary data representations -- remains underexplored. In this work, we extend PAC-Bayesian theory to multi-view learning, introducing novel generalization bounds based on Rényi divergence. These bounds provide an alternative to traditional Kullback-Leibler divergence-based counterparts, leveraging the flexibility of Rényi divergence. Furthermore, we propose first- and second-order oracle PAC-Bayesian bounds and extend the C-bound to multi-view settings. To bridge theory and practice, we design efficient self-bounding optimization algorithms that align with our theoretical results.

Paper Structure

This paper contains 38 sections, 29 theorems, 70 equations, 17 figures, 16 tables, 2 algorithms.

Key Result

Corollary 2.1

Let $V \geq 2$ be the number of views. For any distribution $\mathcal{D}$ on $\mathcal{X} \times \mathcal{Y}$, for any set of prior distributions $\{\mathcal{P}_{v}\}_{v=1}^{V}$, and for any hyper-prior distribution $\pi$ over $[\![V]\!]$, with probability at least $1-\delta$ over a random draw of a

Figures (17)

  • Figure 1: Test error rates and PAC-Bayesian bounds for binary classification between labels 4 and 9 on the mfeat-large dataset, averaged over 10 runs. Each subplot represents a different view. Dotted bars ($\bullet$) indicate bounds, while slashed bars (\\) represent risks. Colors distinguish between bounds, risks, and methods within each subplot. The experiment uses KL divergence for single-view and Rényi divergence ($\alpha=1.1$) for multi-view, with a stump configuration and 50% labeled data. Multi-view results are highlighted in orange.
  • Figure 2: Test error rates and PAC-Bayesian bounds for multiclass classification on the mfeat-large dataset, averaged over 10 runs. Only the concatenated view and the multi-view are shown (full plot with all views in Appendix). The experiment uses the same configuration as Figure \ref{['figure:mfeat-binary-4-9']} with modifications to aid multi-class learning, strong learners with depth=20, and 100% labeled data. Multi-view results are highlighted in orange.
  • Figure 3: Hierarchical structure of multi-view distributions for $V=3$ views (adapted from Goyal17). Each view has voters $\mathcal{H}_v = \{h_1^v, \dots, h_2^v\}$ with prior $\mathcal{P}_v$ before learning (a, blue) updated to a posterior $\mathcal{Q}_v$ after learning (b, blue). And a hyper-prior $\pi$ over views (a, orange) is updated to hyper-posterior $\rho$(b, orange). Rectangle heights represent probability weights assigned to voters and views.
  • Figure 4: Test error rates and PAC-Bayesian bounds for binary classification between labels 4 and 9 on the mfeat-large dataset, averaged over 10 runs. The experiment uses KL divergence for single-view and Rényi divergence ($\alpha=1.1$) for multi-view, with a stump configuration for (a), weak, and strong learners for (b) and (c) resp. and 50% labeled data. Multi-view results are highlighted in orange.
  • Figure 5: Test error rates and PAC-Bayesian bounds for binary classification between labels 4 and 9 on the mfeat-large dataset, averaged over 10 runs. The experiment uses KL divergence for single-view and Rényi divergence for multi-view, we compare between the setting (a) with $\alpha=1.1$ and (b) with $\alpha$ set as a learnable parameter. Using stump configuration and 20% labeled data. Multi-view results are highlighted in orange.
  • ...and 12 more figures

Theorems & Definitions (39)

  • Corollary 2.1: PAC-Bayes-kl Inequality based on Rényi Divergence, in the idea of Seeger/Langford's theorem Seeger03Langford05a
  • Theorem 2.2: First Order Multi-view Oracle Bound Goyal17
  • Theorem 2.3
  • Corollary 2.4
  • Corollary 2.5
  • Theorem 2.6
  • Corollary 2.7
  • Corollary 2.8
  • Corollary 2.9
  • Corollary 2.10
  • ...and 29 more