Dense Feature Learning via Linear Structure Preservation in Medical Data

Yuanyun Zhang; Mingxuan Zhang; Siyuan Li; Zihan Wang; Haoran Chen; Wenbo Zhou; Shi Li

Dense Feature Learning via Linear Structure Preservation in Medical Data

Yuanyun Zhang, Mingxuan Zhang, Siyuan Li, Zihan Wang, Haoran Chen, Wenbo Zhou, Shi Li

TL;DR

Dense feature learning reframes medical representation learning as shaping the linear structure of embeddings rather than optimizing task-specific predictions. By jointly enforcing spectral spreading, subspace consistency, and feature orthogonality, the method learns a well-conditioned, high-rank basis $Z$ that preserves clinically meaningful variation across time and modalities. Empirical results across longitudinal EHR, clinical text, and multimodal data show higher effective rank, improved conditioning, more stable subspaces, and stronger linear transfer to downstream tasks, even without task labels during representation learning. This geometry-centric approach suggests that exposure of data structure can enhance robustness, interpretability, and reusability of medical AI systems, complementing existing supervised and self-supervised paradigms.

Abstract

Deep learning models for medical data are typically trained using task specific objectives that encourage representations to collapse onto a small number of discriminative directions. While effective for individual prediction problems, this paradigm underutilizes the rich structure of clinical data and limits the transferability, stability, and interpretability of learned features. In this work, we propose dense feature learning, a representation centric framework that explicitly shapes the linear structure of medical embeddings. Our approach operates directly on embedding matrices, encouraging spectral balance, subspace consistency, and feature orthogonality through objectives defined entirely in terms of linear algebraic properties. Without relying on labels or generative reconstruction, dense feature learning produces representations with higher effective rank, improved conditioning, and greater stability across time. Empirical evaluations across longitudinal EHR data, clinical text, and multimodal patient representations demonstrate consistent improvements in downstream linear performance, robustness, and subspace alignment compared to supervised and self supervised baselines. These results suggest that learning to span clinical variation may be as important as learning to predict clinical outcomes, and position representation geometry as a first class objective in medical AI.

Dense Feature Learning via Linear Structure Preservation in Medical Data

TL;DR

that preserves clinically meaningful variation across time and modalities. Empirical results across longitudinal EHR, clinical text, and multimodal data show higher effective rank, improved conditioning, more stable subspaces, and stronger linear transfer to downstream tasks, even without task labels during representation learning. This geometry-centric approach suggests that exposure of data structure can enhance robustness, interpretability, and reusability of medical AI systems, complementing existing supervised and self-supervised paradigms.

Abstract

Paper Structure (20 sections, 12 equations, 4 tables)

This paper contains 20 sections, 12 equations, 4 tables.

Introductions
Related Work
Background: Linear Structure in Medical Representations
Methods
Problem Setup and Representation Geometry
Spectral Spreading Objective
Subspace Consistency Across Related Observations
Feature Orthogonality and Redundancy Control
Overall Objective and Optimization
Results
Experimental Setup
Representation Geometry and Effective Rank
Downstream Linear Evaluation
Discussion
Appendix
...and 5 more sections

Dense Feature Learning via Linear Structure Preservation in Medical Data

TL;DR

Abstract

Dense Feature Learning via Linear Structure Preservation in Medical Data

Authors

TL;DR

Abstract

Table of Contents