LSMI-Sinkhorn: Semi-supervised Mutual Information Estimation with Optimal Transport
Yanbin Liu, Makoto Yamada, Yao-Hung Hubert Tsai, Tam Le, Ruslan Salakhutdinov, Yi Yang
TL;DR
The paper tackles the problem of estimating mutual information when only a small number of paired samples are available by leveraging abundant unpaired marginals. It introduces the Least-Squares Mutual Information with Sinkhorn (LSMI-Sinkhorn) framework, which models a density-ratio $r_{\boldsymbol{\alpha}}(x,y)$ and jointly optimizes over a transport plan $\boldsymbol{\Pi}$ and parameters $\boldsymbol{\alpha}$ via alternating updates, with Sinkhorn-based optimization providing $O(n_x n_y)$ complexity. The key contributions are formulating semi-supervised SMI estimation as a joint density-ratio fitting and optimal-transport problem, establishing monotone convergence, and demonstrating strong performance on synthetic data, deep image matching, and photo album summarization, alongside favorable runtime, with code released at the provided URL. The work advances practical MI estimation in settings with limited supervision and has potential impact on tasks requiring robust cross-domain alignment and semi-supervised representation learning.
Abstract
Estimating mutual information is an important statistics and machine learning problem. To estimate the mutual information from data, a common practice is preparing a set of paired samples $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n \stackrel{\mathrm{i.i.d.}}{\sim} p(\mathbf{x},\mathbf{y})$. However, in many situations, it is difficult to obtain a large number of data pairs. To address this problem, we propose the semi-supervised Squared-loss Mutual Information (SMI) estimation method using a small number of paired samples and the available unpaired ones. We first represent SMI through the density ratio function, where the expectation is approximated by the samples from marginals and its assignment parameters. The objective is formulated using the optimal transport problem and quadratic programming. Then, we introduce the Least-Squares Mutual Information with Sinkhorn (LSMI-Sinkhorn) algorithm for efficient optimization. Through experiments, we first demonstrate that the proposed method can estimate the SMI without a large number of paired samples. Then, we show the effectiveness of the proposed LSMI-Sinkhorn algorithm on various types of machine learning problems such as image matching and photo album summarization. Code can be found at https://github.com/csyanbin/LSMI-Sinkhorn.
