Table of Contents
Fetching ...

Preventing Model Collapse in Deep Canonical Correlation Analysis by Noise Regularization

Junlin He, Jinxiao Du, Susu Xu, Wei Ma

TL;DR

NR-DCCA is developed, which is equipped with a novel noise regularization approach to prevent model collapse, and the proposed noise regularization approach can also be generalized to other DCCA-based methods such as DGCCA.

Abstract

Multi-View Representation Learning (MVRL) aims to learn a unified representation of an object from multi-view data. Deep Canonical Correlation Analysis (DCCA) and its variants share simple formulations and demonstrate state-of-the-art performance. However, with extensive experiments, we observe the issue of model collapse, {\em i.e.}, the performance of DCCA-based methods will drop drastically when training proceeds. The model collapse issue could significantly hinder the wide adoption of DCCA-based methods because it is challenging to decide when to early stop. To this end, we develop NR-DCCA, which is equipped with a novel noise regularization approach to prevent model collapse. Theoretical analysis shows that the Correlation Invariant Property is the key to preventing model collapse, and our noise regularization forces the neural network to possess such a property. A framework to construct synthetic data with different common and complementary information is also developed to compare MVRL methods comprehensively. The developed NR-DCCA outperforms baselines stably and consistently in both synthetic and real-world datasets, and the proposed noise regularization approach can also be generalized to other DCCA-based methods such as DGCCA.

Preventing Model Collapse in Deep Canonical Correlation Analysis by Noise Regularization

TL;DR

NR-DCCA is developed, which is equipped with a novel noise regularization approach to prevent model collapse, and the proposed noise regularization approach can also be generalized to other DCCA-based methods such as DGCCA.

Abstract

Multi-View Representation Learning (MVRL) aims to learn a unified representation of an object from multi-view data. Deep Canonical Correlation Analysis (DCCA) and its variants share simple formulations and demonstrate state-of-the-art performance. However, with extensive experiments, we observe the issue of model collapse, {\em i.e.}, the performance of DCCA-based methods will drop drastically when training proceeds. The model collapse issue could significantly hinder the wide adoption of DCCA-based methods because it is challenging to decide when to early stop. To this end, we develop NR-DCCA, which is equipped with a novel noise regularization approach to prevent model collapse. Theoretical analysis shows that the Correlation Invariant Property is the key to preventing model collapse, and our noise regularization forces the neural network to possess such a property. A framework to construct synthetic data with different common and complementary information is also developed to compare MVRL methods comprehensively. The developed NR-DCCA outperforms baselines stably and consistently in both synthetic and real-world datasets, and the proposed noise regularization approach can also be generalized to other DCCA-based methods such as DGCCA.

Paper Structure

This paper contains 34 sections, 8 theorems, 27 equations, 11 figures, 5 tables.

Key Result

Theorem 1

Given $W_k$ is a square matrix for any $k$ and $\eta_k = \left| Corr(W_{k}X_{k},W_{k}A_{k}) - Corr(X_{k},A_{k}) \right|$, we have $\eta_k = 0$ (i.e. CIP) $\iff W_k$ is full-rank.

Figures (11)

  • Figure 1: Eigenvalue distributions of the first linear layer's weight matrices in the encoder of $1$-st view.
  • Figure 2: Illustration of NR-DCCA. We take the CUB dataset as an example: similar to DCCA, the $k$-th view $X_k$ is transformed using $f_k$ to obtain new representation $f_k(X_k)$ and then maximize the correlation between new representations. Additionally, for the $k$-th view, we incorporate the proposed NR loss to regularize $f_k$.
  • Figure 3: Construction of a synthetic dataset. This example consists of $2$ views and $n$ objects, and the common rate is $0\%$.
  • Figure 4: (a) Mean and standard deviation of the (D)CCA-based method performance across synthetic datasets in different training epochs. (b) The mean correlation between noise and real data after transformation varies with epochs. (c) Average NESum across all weights within the trained encoders. (d,e) The mean of reconstruction and denoising loss on the test set.
  • Figure 5: Performance of different methods in real-world datasets. Each column represents the performance on a specific dataset. The number of views in the dataset is denoted in the parentheses next to the dataset name.
  • ...and 6 more figures

Theorems & Definitions (11)

  • Theorem 1: Correlation Invariant Property (CIP) of $W_k$
  • Theorem 2: Effects of CIP on the obtained representations
  • Definition 1: Common Rate
  • Lemma 1
  • Definition 2
  • Lemma 2: MPI-based CCA
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Lemma 6
  • ...and 1 more