Contrastive Dimension Reduction: A Systematic Review
Sam Hawke, Eric Zhang, Jiawen Chen, Didong Li
TL;DR
This paper surveys Contrastive Dimension Reduction (CDR) methods for isolating foreground-specific signal relative to background variation in high-dimensional data, introducing a unifying pipeline and taxonomy that encompasses linear (matrix-decomposition and model-based) and nonlinear approaches, plus structure-aware variants. It details CPCA, GCPCA, CCUR, PCPCA, CLVM, CPLVM, CVAE, CVI, CFS, CFPCA, CIR, CLR, and preprocessing strategies, illustrating their use with a corrupted MNIST toy example and a mouse protein case study. It also discusses practical limitations including hyperparameter selection, interpretability, and uncertainty quantification, and outlines future directions such as multi-modal data, multiple/continuous treatments, and connections to contrastive learning. The framework aims to facilitate broader adoption of CDR and spur methodological advances with real-world impact in genomics, imaging, and time-series analysis, by providing concrete tools, diagnostics (e.g., CDE), and principled background-selection strategies. $C = C_X - \gamma C_Y$ and other key formulas anchor the methods, while the emphasis on interpretability and uncertainty aligns CDR with practical scientific inference.
Abstract
Contrastive dimension reduction (CDR) methods aim to extract signal unique to or enriched in a treatment (foreground) group relative to a control (background) group. This setting arises in many scientific domains, such as genomics, imaging, and time series analysis, where traditional dimension reduction techniques such as Principal Component Analysis (PCA) may fail to isolate the signal of interest. In this review, we provide a systematic overview of existing CDR methods. We propose a pipeline for analyzing case-control studies together with a taxonomy of CDR methods based on their assumptions, objectives, and mathematical formulations, unifying disparate approaches under a shared conceptual framework. We highlight key applications and challenges in existing CDR methods, and identify open questions and future directions. By providing a clear framework for CDR and its applications, we aim to facilitate broader adoption and motivate further developments in this emerging field.
