Towards the Generalization of Multi-view Learning: An Information-theoretical Analysis
Wen Wen, Tieliang Gong, Yuxin Dong, Shujian Yu, Weizhan Zhang
TL;DR
This work develops information-theoretic generalization bounds for multi-view learning in both reconstruction and classification tasks, revealing how capturing both consensus and complementary information across views enables maximally disentangled representations and improved generalization. It introduces a scalable, data-dependent framework that uses one-dimensional auxiliary variables and typical-set arguments to derive LOO, supersample, and fast-rate bounds, with rates on the order of $\tilde{O}(1/\sqrt{nm})$ and $1/(nm)$ in interpolating regimes. The bounds hinge on information measures involving the common component $C$ and view-unique components $U^{(j)}$, specifically $H(C)$, $H(U^{(j)})$, and $I(X^{(j)};C,U^{(j)}|Y)$, and leverage the multi-view IB regularizer. Empirical results on synthetic and real datasets corroborate the tight coupling between the generalization gap and the proposed bounds, validating the theory-driven advantage of multi-view learning. These findings provide a principled foundation for designing MV learning algorithms that balance representation power and generalization, with potential impact in multi-sensor fusion and cross-domain perception tasks.
Abstract
Multiview learning has drawn widespread attention for its efficacy in leveraging cross-view consensus and complementarity information to achieve a comprehensive representation of data. While multi-view learning has undergone vigorous development and achieved remarkable success, the theoretical understanding of its generalization behavior remains elusive. This paper aims to bridge this gap by developing information-theoretic generalization bounds for multi-view learning, with a particular focus on multi-view reconstruction and classification tasks. Our bounds underscore the importance of capturing both consensus and complementary information from multiple different views to achieve maximally disentangled representations. These results also indicate that applying the multi-view information bottleneck regularizer is beneficial for satisfactory generalization performance. Additionally, we derive novel data-dependent bounds under both leave-one-out and supersample settings, yielding computational tractable and tighter bounds. In the interpolating regime, we further establish the fast-rate bound for multi-view learning, exhibiting a faster convergence rate compared to conventional square-root bounds. Numerical results indicate a strong correlation between the true generalization gap and the derived bounds across various learning scenarios.
