IndiSeek learns information-guided disentangled representations

Yu Gui; Cong Ma; Zongming Ma

IndiSeek learns information-guided disentangled representations

Yu Gui, Cong Ma, Zongming Ma

TL;DR

IndiSeek tackles the challenge of learning disentangled, information-preserving representations from multi-modal data by first extracting shared cross-modal features with CLIP and then enforcing modality-specific independence from these shared features via a reconstruction-guided bound on mutual information. The method uses an upper-bound NCE-CLUB term for disentanglement and a reconstruction-based surrogate for completeness, enabling robust extraction of modality-specific signals even under nonlinear dependencies and redundant shared information. Experiments on synthetic simulations, a CITE-seq dataset, and diverse MultiBench benchmarks show IndiSeek outperforms state-of-the-art disentanglement baselines and improves downstream task performance while maintaining computational efficiency. The work also outlines task-related extensions and practical guidance for parameter tuning, highlighting the broad applicability of principled information-guided disentanglement in real-world multi-modal applications.

Abstract

Learning disentangled representations is a fundamental task in multi-modal learning. In modern applications such as single-cell multi-omics, both shared and modality-specific features are critical for characterizing cell states and supporting downstream analyses. Ideally, modality-specific features should be independent of shared ones while also capturing all complementary information within each modality. This tradeoff is naturally expressed through information-theoretic criteria, but mutual-information-based objectives are difficult to estimate reliably, and their variational surrogates often underperform in practice. In this paper, we introduce IndiSeek, a novel disentangled representation learning approach that addresses this challenge by combining an independence-enforcing objective with a computationally efficient reconstruction loss that bounds conditional mutual information. This formulation explicitly balances independence and completeness, enabling principled extraction of modality-specific features. We demonstrate the effectiveness of IndiSeek on synthetic simulations, a CITE-seq dataset and multiple real-world multi-modal benchmarks.

IndiSeek learns information-guided disentangled representations

TL;DR

Abstract

IndiSeek learns information-guided disentangled representations

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (25)