Dense Feature Learning via Linear Structure Preservation in Medical Data
Yuanyun Zhang, Mingxuan Zhang, Siyuan Li, Zihan Wang, Haoran Chen, Wenbo Zhou, Shi Li
TL;DR
Dense feature learning reframes medical representation learning as shaping the linear structure of embeddings rather than optimizing task-specific predictions. By jointly enforcing spectral spreading, subspace consistency, and feature orthogonality, the method learns a well-conditioned, high-rank basis $Z$ that preserves clinically meaningful variation across time and modalities. Empirical results across longitudinal EHR, clinical text, and multimodal data show higher effective rank, improved conditioning, more stable subspaces, and stronger linear transfer to downstream tasks, even without task labels during representation learning. This geometry-centric approach suggests that exposure of data structure can enhance robustness, interpretability, and reusability of medical AI systems, complementing existing supervised and self-supervised paradigms.
Abstract
Deep learning models for medical data are typically trained using task specific objectives that encourage representations to collapse onto a small number of discriminative directions. While effective for individual prediction problems, this paradigm underutilizes the rich structure of clinical data and limits the transferability, stability, and interpretability of learned features. In this work, we propose dense feature learning, a representation centric framework that explicitly shapes the linear structure of medical embeddings. Our approach operates directly on embedding matrices, encouraging spectral balance, subspace consistency, and feature orthogonality through objectives defined entirely in terms of linear algebraic properties. Without relying on labels or generative reconstruction, dense feature learning produces representations with higher effective rank, improved conditioning, and greater stability across time. Empirical evaluations across longitudinal EHR data, clinical text, and multimodal patient representations demonstrate consistent improvements in downstream linear performance, robustness, and subspace alignment compared to supervised and self supervised baselines. These results suggest that learning to span clinical variation may be as important as learning to predict clinical outcomes, and position representation geometry as a first class objective in medical AI.
