Table of Contents
Fetching ...

Structured Contrastive Learning for Interpretable Latent Representations

Zhengyang Shen, Hua Tu, Mayue Shi

TL;DR

This work tackles transformation brittleness in neural networks, where semantically equivalent inputs under shifts yield drastically different representations. It introduces Structured Contrastive Learning (SCL), which partitions latent representations into invariant, variant, and free features and adds a variant-enhanced contrastive objective to explicitly model inter-sample relationships. The method achieves superior robustness and interpretability, demonstrated by dramatic ECG cosine similarity gains (from $0.25$ to $0.91$) and high IMU rotation robustness (86.65% accuracy with 95.38% rotation consistency) across medical signal retrieval and activity recognition, without architectural changes. By shifting from reactive augmentation to proactive latent-space structuring, the approach enables glass-box, interpretable models suitable for critical applications and lays groundwork for multi-modal and few-shot extensions.

Abstract

Neural networks exhibit severe brittleness to semantically irrelevant transformations. A mere 75ms electrocardiogram (ECG) phase shift degrades latent cosine similarity from 1.0 to 0.2, while sensor rotations collapse activity recognition performance with inertial measurement units (IMUs). We identify the root cause as "laissez-faire" representation learning, where latent spaces evolve unconstrained provided task performance is satisfied. We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups: invariant features that remain consistent under given transformations (e.g., phase shifts or rotations), variant features that actively differentiate transformations via a novel variant mechanism, and free features that preserve task flexibility. This creates controllable push-pull dynamics where different latent dimensions serve distinct, interpretable purposes. The variant mechanism enhances contrastive learning by encouraging variant features to differentiate within positive pairs, enabling simultaneous robustness and interpretability. Our approach requires no architectural modifications and integrates seamlessly into existing training pipelines. Experiments on ECG phase invariance and IMU rotation robustness demonstrate superior performance: ECG similarity improves from 0.25 to 0.91 under phase shifts, while WISDM activity recognition achieves 86.65% accuracy with 95.38% rotation consistency, consistently outperforming traditional data augmentation. This work represents a paradigm shift from reactive data augmentation to proactive structural learning, enabling interpretable latent representations in neural networks.

Structured Contrastive Learning for Interpretable Latent Representations

TL;DR

This work tackles transformation brittleness in neural networks, where semantically equivalent inputs under shifts yield drastically different representations. It introduces Structured Contrastive Learning (SCL), which partitions latent representations into invariant, variant, and free features and adds a variant-enhanced contrastive objective to explicitly model inter-sample relationships. The method achieves superior robustness and interpretability, demonstrated by dramatic ECG cosine similarity gains (from to ) and high IMU rotation robustness (86.65% accuracy with 95.38% rotation consistency) across medical signal retrieval and activity recognition, without architectural changes. By shifting from reactive augmentation to proactive latent-space structuring, the approach enables glass-box, interpretable models suitable for critical applications and lays groundwork for multi-modal and few-shot extensions.

Abstract

Neural networks exhibit severe brittleness to semantically irrelevant transformations. A mere 75ms electrocardiogram (ECG) phase shift degrades latent cosine similarity from 1.0 to 0.2, while sensor rotations collapse activity recognition performance with inertial measurement units (IMUs). We identify the root cause as "laissez-faire" representation learning, where latent spaces evolve unconstrained provided task performance is satisfied. We propose Structured Contrastive Learning (SCL), a framework that partitions latent space representations into three semantic groups: invariant features that remain consistent under given transformations (e.g., phase shifts or rotations), variant features that actively differentiate transformations via a novel variant mechanism, and free features that preserve task flexibility. This creates controllable push-pull dynamics where different latent dimensions serve distinct, interpretable purposes. The variant mechanism enhances contrastive learning by encouraging variant features to differentiate within positive pairs, enabling simultaneous robustness and interpretability. Our approach requires no architectural modifications and integrates seamlessly into existing training pipelines. Experiments on ECG phase invariance and IMU rotation robustness demonstrate superior performance: ECG similarity improves from 0.25 to 0.91 under phase shifts, while WISDM activity recognition achieves 86.65% accuracy with 95.38% rotation consistency, consistently outperforming traditional data augmentation. This work represents a paradigm shift from reactive data augmentation to proactive structural learning, enabling interpretable latent representations in neural networks.

Paper Structure

This paper contains 10 sections, 5 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Transformation brittleness in latent representations. Traditional VAEs exhibit severe phase sensitivity: identical ECG waveforms at different temporal positions produce dramatically different latent vectors, with cosine similarity degrading from 1.0 to below 0.2 across phase shifts (sampling rate: 400 Hz).
  • Figure 2: Structured contrastive learning in latent space.Top: Traditional data augmentation provides no control over latent representations. Bottom: Our method partitions features into invariant (pulled together), variant (pushed apart via variant mechanism), and free (unconstrained) components, transforming neural networks into interpretable systems with controllable semantic meaning.
  • Figure 3: Phase invariance transformation results.
  • Figure 4: Clinical query effectiveness demonstration. Our method (rightmost) successfully retrieves morphologically similar signals regardless of phase alignment (0.935-0.941 similarity), while baseline and data augmentation methods remain "phase-locked," achieving much lower similarities (0.522-0.628) and missing clinically relevant patterns.
  • Figure 5: Feature space organization through structured learning. t-SNE visualizations show a progressive organization from baseline scatter (leftmost) to our structured clusters (rightmost) visually demonstrates the structured transformation—converting chaotic latent spaces into interpretable, organized representations with clear activity separation.
  • ...and 1 more figures