Table of Contents
Fetching ...

Discovering Common Information in Multi-view Data

Qi Zhang, Mingfei Lu, Shujian Yu, Jingmin Xin, Badong Chen

TL;DR

An innovative and mathematically rigorous definition for computing common information from multi-view data is introduced, drawing inspiration from G\'acs-Korner common information in information theory and a novel supervised multi-view learning framework is developed to capture both common and unique information.

Abstract

We introduce an innovative and mathematically rigorous definition for computing common information from multi-view data, drawing inspiration from Gács-Körner common information in information theory. Leveraging this definition, we develop a novel supervised multi-view learning framework to capture both common and unique information. By explicitly minimizing a total correlation term, the extracted common information and the unique information from each view are forced to be independent of each other, which, in turn, theoretically guarantees the effectiveness of our framework. To estimate information-theoretic quantities, our framework employs matrix-based R{é}nyi's $α$-order entropy functional, which forgoes the need for variational approximation and distributional estimation in high-dimensional space. Theoretical proof is provided that our framework can faithfully discover both common and unique information from multi-view data. Experiments on synthetic and seven benchmark real-world datasets demonstrate the superior performance of our proposed framework over state-of-the-art approaches.

Discovering Common Information in Multi-view Data

TL;DR

An innovative and mathematically rigorous definition for computing common information from multi-view data is introduced, drawing inspiration from G\'acs-Korner common information in information theory and a novel supervised multi-view learning framework is developed to capture both common and unique information.

Abstract

We introduce an innovative and mathematically rigorous definition for computing common information from multi-view data, drawing inspiration from Gács-Körner common information in information theory. Leveraging this definition, we develop a novel supervised multi-view learning framework to capture both common and unique information. By explicitly minimizing a total correlation term, the extracted common information and the unique information from each view are forced to be independent of each other, which, in turn, theoretically guarantees the effectiveness of our framework. To estimate information-theoretic quantities, our framework employs matrix-based R{é}nyi's -order entropy functional, which forgoes the need for variational approximation and distributional estimation in high-dimensional space. Theoretical proof is provided that our framework can faithfully discover both common and unique information from multi-view data. Experiments on synthetic and seven benchmark real-world datasets demonstrate the superior performance of our proposed framework over state-of-the-art approaches.
Paper Structure (31 sections, 2 theorems, 21 equations, 6 figures, 2 tables)

This paper contains 31 sections, 2 theorems, 21 equations, 6 figures, 2 tables.

Key Result

Proposition 1

Define where $\mathcal{C}$, $\mathcal{U}^{(1)}$,$\dots$, $\mathcal{U}^{(v)}$ are mutually independent. Then, for a set of any invertible transformations $\{f_i\}_{i=1}^v$, the random variable $\mathcal{Z}^*$ optimized from Equation eq:prop1 represents the common information $\mathcal{C}$ that is concealed

Figures (6)

  • Figure 1: Schematic representation of the common and unique multi-view information (CUMI) learning Framework. The CUMI framework is designed to learn a joint representation, denoted as $\mathcal{Z}$, which comprises common features $\mathcal{C}$ and unique features $\mathcal{U}$. The encoder $\phi_{\mathcal{C}}$ is responsible for extracting common features, guided by our definition of the multi-view common information criterion as per Equation (\ref{['eq:extend_GK']}). Unique features pertaining to the $i$-th view, represented as $\mathcal{U}^{(i)}$, are extracted by the independent encoder $\phi_{\mathcal{U}}^{{(i)}}$, which is coupled with a reconstruction network $\psi_{(i)}$. To maintain the independence of the components in $\mathcal{Z}$, we introduce a total correlation constraint, denoted as $\text{TC}$, to ensure the independence of each term.
  • Figure 2: Comparison of various methods in discovering common and unique information from multi-view data. Ground-truth (\ref{['fig:sanity_groundtruth']}) displays common (red) and unique information (blue, green). Results for CCA (\ref{['fig:sanity_cca']}), DCCA (\ref{['fig:sanity_dcca']}), MIB (\ref{['fig:sanity_mib']}), VCI (kleinman2022gacs), and the proposed method (\ref{['fig:sanity_our']}) show common (red, magenta) and unique information (blue, green). Our proposed approach closely aligns with the ground truth.
  • Figure 3: The figure shows the convergence behavior of the $\text{MSE}$ for the common features $\mathcal{C}$. The curves $\text{MSE}(\mathcal{C};\mathcal{C}^{(i)}), i=1,2$ measure the discrepancy between $\mathcal{C}$ and the features $\mathcal{C}^{(i)}$ from different views. The decreasing and converging nature of both curves confirms the successful convergence of $\mathcal{C}$, validating our method's ability to extract consistent common representations from multiple views.
  • Figure 4: Independence Analysis of $\mathcal{C}$ and $\mathcal{U}$. The figure presents the curves of TC and HSIC as measures of the independence between $\mathcal{C}$ and $\mathcal{U}$. These metrics quantify the degree of dependence between $\mathcal{C}$ and each $\mathcal{U}^{(i)}$. The experimental results demonstrate the effectiveness of our constraints in promoting independence between $\mathcal{C}$ and $\mathcal{U}^{(i)}$. The curves exhibit a decreasing trend, providing evidence for the ability of our approach to enforce the desired independence between these variables.
  • Figure 5: Results of the Nemenyi test indicating significant differences among the methods. The groups consisting of DCCA, CCA, and DUA-Net; CCA, DUA-Net, and MVSS; DUA-Net, MVSS, and TMC; MVSS, TMC, WeightReg, and MEIB; and MEIB and CUMI exhibit no significant differences. However, our method, CUMI, exhibits significant differences compared to the majority of the other methods.
  • ...and 1 more figures

Theorems & Definitions (3)

  • Definition 1
  • Proposition 1
  • Proposition 2