Table of Contents
Fetching ...

CT-to-X-ray Distillation Under Tiny Paired Cohorts: An Evidence-Bounded Reproducible Pilot Study

Bo Ma, Jinsong Wu, Weiqi Yan, Hongjiang Wei

Abstract

Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control attains the highest mean result on the four-image validation subset (0.875 accuracy and 0.714 macro-F1), whereas the full module-augmented JDCNet variant remains at 0.750 accuracy and 0.429 macro-F1. To test whether that ranking is a split artifact, we additionally run eight patient-level Monte Carlo resamples with same-case comparisons, stronger mechanism controls based on attention transfer and feature hints, and imbalance-sensitive analyses. Under this resampled protocol, late fusion attains the highest mean accuracy (0.885), same-modality distillation attains the highest mean macro-F1 (0.554) and balanced accuracy (0.660), the plain cross-modal control drops to 0.500 mean balanced accuracy, and neither attention transfer nor feature hints recover a robust cross-modality advantage. The contribution of this study is therefore not a validated CT-to-X-ray architecture, but a reproducible and evidence-bounded pilot protocol that makes the exact task definition, failure modes, ranking instability, and the minimum requirements for future credible CT-to-X-ray transfer claims explicit.

CT-to-X-ray Distillation Under Tiny Paired Cohorts: An Evidence-Bounded Reproducible Pilot Study

Abstract

Chest X-ray and computed tomography (CT) provide complementary views of thoracic disease, yet most computer-aided diagnosis models are trained and deployed within a single imaging modality. The concrete question studied here is narrower and deployment-oriented: on a patient-level paired chest cohort, can CT act as training-only supervision for a binary disease versus non-disease X-ray classifier without requiring CT at inference time? We study this setting as a cross-modality teacher--student distillation problem and use JDCNet as an executable pilot scaffold rather than as a validated superior architecture. On the original patient-level paired split from a public paired chest imaging cohort, a stripped-down plain cross-modal logit-KD control attains the highest mean result on the four-image validation subset (0.875 accuracy and 0.714 macro-F1), whereas the full module-augmented JDCNet variant remains at 0.750 accuracy and 0.429 macro-F1. To test whether that ranking is a split artifact, we additionally run eight patient-level Monte Carlo resamples with same-case comparisons, stronger mechanism controls based on attention transfer and feature hints, and imbalance-sensitive analyses. Under this resampled protocol, late fusion attains the highest mean accuracy (0.885), same-modality distillation attains the highest mean macro-F1 (0.554) and balanced accuracy (0.660), the plain cross-modal control drops to 0.500 mean balanced accuracy, and neither attention transfer nor feature hints recover a robust cross-modality advantage. The contribution of this study is therefore not a validated CT-to-X-ray architecture, but a reproducible and evidence-bounded pilot protocol that makes the exact task definition, failure modes, ranking instability, and the minimum requirements for future credible CT-to-X-ray transfer claims explicit.

Paper Structure

This paper contains 29 sections, 1 equation, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the executable pilot scaffold evaluated in this study. The CT teacher path is active only during training, the X-ray student path defines the deployed model, and DPE/MHRA/DFPN denote optional mechanisms whose value is tested rather than assumed.
  • Figure 2: Feasibility-only fixed-split summary on the paired X-ray target cohort. Bars show repeated-run means, and overlaid points show per-seed outcomes. The stripped-down plain cross-modal logit-KD control attains the highest mean paired-cohort result, but the per-seed spread shows that this advantage remains unstable on a validation split containing only four patients and four X-ray images (three positive, one negative).
  • Figure 3: Primary same-case evidence across eight patient-level Monte Carlo resamples on the paired cohort. Each point denotes one held-out split, and every model in a given split is evaluated on the exact same held-out patients. Each split contains five validation patients and 5--10 validation images, with one negative patient by construction. This figure is the visual counterpart to the primary same-case evidence table: the ranking changes relative to the original four-image validation split, late fusion attains the highest mean accuracy, same-modality distillation attains the highest mean macro-F1 and balanced accuracy, and neither plain cross-modal logit KD nor the added mechanism controls yields a stable cross-modality advantage.
  • Figure 4: Cross-modality distillation ablation on the paired X-ray target cohort. The near-flat response surface indicates that the current setup is data-limited: changing the temperature and distillation weight has little effect on macro-F1 under the present split. The figure should therefore be read as evidence that the current regime is not discriminative enough for hyperparameter ranking, not as proof of a meaningful optimum.
  • Figure 5: Module ablations for the cross-modality pipeline. Bars show repeated-run means, and overlaid points show per-seed outcomes. Under the current four-image validation split, the figure is more informative as evidence of instability and weak module discriminability than as evidence that any individual module has been positively validated.