Differentiable Information Bottleneck for Deterministic Multi-view Clustering

Xiaoqiang Yan; Zhixiang Jin; Fengshou Han; Yangdong Ye

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

Xiaoqiang Yan, Zhixiang Jin, Fengshou Han, Yangdong Ye

TL;DR

This work proposes a new differentiable information bottleneck (DIB) method, which provides a deterministic and analytical MVC solution by fitting the mutual information without the necessity of variational approximation.

Abstract

In recent several years, the information bottleneck (IB) principle provides an information-theoretic framework for deep multi-view clustering (MVC) by compressing multi-view observations while preserving the relevant information of multiple views. Although existing IB-based deep MVC methods have achieved huge success, they rely on variational approximation and distribution assumption to estimate the lower bound of mutual information, which is a notoriously hard and impractical problem in high-dimensional multi-view spaces. In this work, we propose a new differentiable information bottleneck (DIB) method, which provides a deterministic and analytical MVC solution by fitting the mutual information without the necessity of variational approximation. Specifically, we first propose to directly fit the mutual information of high-dimensional spaces by leveraging normalized kernel Gram matrix, which does not require any auxiliary neural estimator to estimate the lower bound of mutual information. Then, based on the new mutual information measurement, a deterministic multi-view neural network with analytical gradients is explicitly trained to parameterize IB principle, which derives a deterministic compression of input variables from different views. Finally, a triplet consistency discovery mechanism is devised, which is capable of mining the feature consistency, cluster consistency and joint consistency based on the deterministic and compact representations. Extensive experimental results show the superiority of our DIB method on 6 benchmarks compared with 13 state-of-the-art baselines.

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

TL;DR

Abstract

Paper Structure (20 sections, 2 theorems, 15 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 20 sections, 2 theorems, 15 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work and Preliminaries
Information Bottleneck
Deep Multi-view Clustering
Differentiable Information Bottleneck
Problem Statement
Mutual Information without Variational Approximation
Deterministic Compression
Triplet Consistency Discovery
Experiments
Datasets
Implementation
Baselines
Evaluation Metrics
Performance Analysis
...and 5 more sections

Key Result

Proposition 1

The R$\acute{e}$nyi's $\alpha$-order entropy of a random variable $\textbf{X} \in \mathcal{R}^{N\times D^v}$ can be fitted by the eigenvalues of a Gram matrix which is constructed by evaluating a positive definite kernel function for each pair of data points.

Figures (5)

Figure 1: Variational approximation and deterministic measurement. (a) Variational approximation requires a neural estimator to estimate the posterior distribution $p(z|x)$ of the representation while assuming the marginal distribution of the representation follows a standard normal distribution, so as to approximate the lower bound of the mutual information. (b) Our deterministic measurement leverages the gaussian kernel function to construct the kernel Gram matrix which can measure the distance between data pairs. Then the eigenvalues of the Gram matrix can be expressed to entropy function (see section \ref{['MIWVA']} for detailed proof).
Figure 2: Framework of the DIB. In DIB, the deterministic compression aims to learn a compact representation for each view through the new mutual information measurement without variational approximation. The triplet consistency discovery mechanism is devised to mine the feature, cluster and joint consistency from the compact representation.
Figure 3: Parameter $\gamma$ and $\beta$ sensitivity experiment results.
Figure 4: Convergence curves on MNIST-USPS and ESP.
Figure 5: Mutual information with/without VA on the data pairs sampled from MNIST-USPS.

Theorems & Definitions (8)

Definition 1: Differentiable information bottleneck, DIB
Definition 2: Gram matrix
Proposition 1
proof
Definition 3: Matrix-based R$\acute{e}$nyi's $\alpha$-order entropy function
Definition 4: Matrix-based R$\acute{e}$nyi's $\alpha$-order joint-entropy function
Proposition 2
proof

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

TL;DR

Abstract

Differentiable Information Bottleneck for Deterministic Multi-view Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (8)