Table of Contents
Fetching ...

Temporal Inversion for Learning Interval Change in Chest X-Rays

Hanbin Ko, Kyeongmin Jeon, Doowoong Choi, Chang Min Park

Abstract

Recent advances in vision--language pretraining have enabled strong medical foundation models, yet most analyze radiographs in isolation, overlooking the key clinical task of comparing prior and current images to assess interval change. For chest radiographs (CXRs), capturing interval change is essential, as radiologists must evaluate not only the static appearance of findings but also how they evolve over time. We introduce TILA (Temporal Inversion-aware Learning and Alignment), a simple yet effective framework that uses temporal inversion, reversing image pairs, as a supervisory signal to enhance the sensitivity of existing temporal vision-language models to directional change. TILA integrates inversion-aware objectives across pretraining, fine-tuning, and inference, complementing conventional appearance modeling with explicit learning of temporal order. We also propose a unified evaluation protocol to assess order sensitivity and consistency under temporal inversion, and introduce MS-CXR-Tretrieval, a retrieval evaluation set constructed through a general protocol that can be applied to any temporal CXR dataset. Experiments on public datasets and real-world hospital cohorts demonstrate that TILA consistently improves progression classification and temporal embedding alignment when applied to multiple existing architectures.

Temporal Inversion for Learning Interval Change in Chest X-Rays

Abstract

Recent advances in vision--language pretraining have enabled strong medical foundation models, yet most analyze radiographs in isolation, overlooking the key clinical task of comparing prior and current images to assess interval change. For chest radiographs (CXRs), capturing interval change is essential, as radiologists must evaluate not only the static appearance of findings but also how they evolve over time. We introduce TILA (Temporal Inversion-aware Learning and Alignment), a simple yet effective framework that uses temporal inversion, reversing image pairs, as a supervisory signal to enhance the sensitivity of existing temporal vision-language models to directional change. TILA integrates inversion-aware objectives across pretraining, fine-tuning, and inference, complementing conventional appearance modeling with explicit learning of temporal order. We also propose a unified evaluation protocol to assess order sensitivity and consistency under temporal inversion, and introduce MS-CXR-Tretrieval, a retrieval evaluation set constructed through a general protocol that can be applied to any temporal CXR dataset. Experiments on public datasets and real-world hospital cohorts demonstrate that TILA consistently improves progression classification and temporal embedding alignment when applied to multiple existing architectures.

Paper Structure

This paper contains 54 sections, 10 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: TILA Framework: Temporal Inversion-aware Learning and Alignment. The framework comprises three stages: pretraining, fine-tuning, and inference. (a–b) Pretraining: paired CXRs are encoded in both original and reversed orders; the Change-aware Sigmoid Loss aligns unchanged cases and separates changed ones. (c) Fine-tuning: the Bidirectional Cross-Entropy (BiCE) enforces label inversion, while the Temporal Consistency Loss (TCL) aligns probability distributions under reversal. (d) Inference: forward and reversed predictions are fused via inversion-aware scoring to enhance robustness and order consistency.
  • Figure 2: Score Distribution Analysis for Pleural Effusion (MS-CXR-T). Distribution of prediction scores (scaled by 10) for each pleural effusion progression label, comparing baseline and TILA models in the zero-shot setting. Each boxplot separates cases by progression label (improved, stable, worsened) and label quality (consensus vs. disagreement), shown for both standard and inversion-aware (combined) scoring. Key trends illustrated: i) Changes in scores for each label (improved, stable, worsened). ii) Impact of applying inversion-aware scoring (combined). iii) Differences in scores based on label quality (consensus vs. disagreement).
  • Figure 3: Example workflow for constructing MS-CXR-Tretrieval from the original benchmark.