Table of Contents
Fetching ...

AdaMuS: Adaptive Multi-view Sparsity Learning for Dimensionally Unbalanced Data

Cai Xu, Changhao Sun, Ziyu Guan, Wei Zhao

Abstract

Multi-view learning primarily aims to fuse multiple features to describe data comprehensively. Most prior studies implicitly assume that different views share similar dimensions. In practice, however, severe dimensional disparities often exist among different views, leading to the unbalanced multi-view learning issue. For example, in emotion recognition tasks, video frames often reach dimensions of $10^6$, while physiological signals comprise only $10^1$ dimensions. Existing methods typically face two main challenges for this problem: (1) They often bias towards high-dimensional data, overlooking the low-dimensional views. (2) They struggle to effectively align representations under extreme dimensional imbalance, which introduces severe redundancy into the low-dimensional ones. To address these issues, we propose the Adaptive Multi-view Sparsity Learning (AdaMuS) framework. First, to prevent ignoring the information of low-dimensional views, we construct view-specific encoders to map them into a unified dimensional space. Given that mapping low-dimensional data to a high-dimensional space often causes severe overfitting, we design a parameter-free pruning method to adaptively remove redundant parameters in the encoders. Furthermore, we propose a sparse fusion paradigm that flexibly suppresses redundant dimensions and effectively aligns each view. Additionally, to learn representations with stronger generalization, we propose a self-supervised learning paradigm that obtains supervision information by constructing similarity graphs. Extensive evaluations on a synthetic toy dataset and seven real-world benchmarks demonstrate that AdaMuS consistently achieves superior performance and exhibits strong generalization across both classification and semantic segmentation tasks.

AdaMuS: Adaptive Multi-view Sparsity Learning for Dimensionally Unbalanced Data

Abstract

Multi-view learning primarily aims to fuse multiple features to describe data comprehensively. Most prior studies implicitly assume that different views share similar dimensions. In practice, however, severe dimensional disparities often exist among different views, leading to the unbalanced multi-view learning issue. For example, in emotion recognition tasks, video frames often reach dimensions of , while physiological signals comprise only dimensions. Existing methods typically face two main challenges for this problem: (1) They often bias towards high-dimensional data, overlooking the low-dimensional views. (2) They struggle to effectively align representations under extreme dimensional imbalance, which introduces severe redundancy into the low-dimensional ones. To address these issues, we propose the Adaptive Multi-view Sparsity Learning (AdaMuS) framework. First, to prevent ignoring the information of low-dimensional views, we construct view-specific encoders to map them into a unified dimensional space. Given that mapping low-dimensional data to a high-dimensional space often causes severe overfitting, we design a parameter-free pruning method to adaptively remove redundant parameters in the encoders. Furthermore, we propose a sparse fusion paradigm that flexibly suppresses redundant dimensions and effectively aligns each view. Additionally, to learn representations with stronger generalization, we propose a self-supervised learning paradigm that obtains supervision information by constructing similarity graphs. Extensive evaluations on a synthetic toy dataset and seven real-world benchmarks demonstrate that AdaMuS consistently achieves superior performance and exhibits strong generalization across both classification and semantic segmentation tasks.
Paper Structure (28 sections, 16 equations, 14 figures, 4 tables, 1 algorithm)

This paper contains 28 sections, 16 equations, 14 figures, 4 tables, 1 algorithm.

Figures (14)

  • Figure 1: Ubiquitous Dimensional Imbalance in Real-world Multi-view Applications. We illustrate four representative scenarios—(a) Medical Diagnosis, (b) Financial Analysis, (c) Emotion Recognition, and (d) Monitoring—where extreme dimensionality disparities (e.g., $\approx 10^6$ vs. $\approx 10^1$) between views are prevalent. This fundamental imbalance challenges conventional fusion methods, which often bias towards high-dimensional features.
  • Figure 2: Illustration of the proposed AdaMuS framework. AdaMuS establishes view-specific encoders $f^{(v)}(\cdot)$ to process unbalanced multi-view data. These encoders are structurally optimized by the Principal Neuron Analysis (PNA) module to learn aligned representations $\{z_n^v\}_{v=1}^V$ with a unified dimension. Specifically, a Multi-view Sparse Batch Normalization (MSBN) layer is proposed to explicitly integrate these features via a sparse fusion paradigm. Finally, we train AdaMuS in a self-supervised manner to guide the representation learning.
  • Figure 3: Learning curves of a DNN classifier on the low-dimensional view of the CUB dataset. The model projects the 30-dimensional input into a high-dimensional space (structure: 30–1024–512–10). The increasing gap between training and validation performance demonstrates that expanding low-dimensional data with deep networks leads to overfitting.
  • Figure 4: PNA workflow: (1) Compute covariance matrix from layer outputs; (2) Extract eigenvalue distribution; (3) Measure dissimilarity to Uniform/Dirac baselines; (4) Adjust pruning rate with view imbalance.
  • Figure 5: Construction process of the multi-view toy example.
  • ...and 9 more figures