Table of Contents
Fetching ...

SFT-KD-Recon: Learning a Student-friendly Teacher for Knowledge Distillation in Magnetic Resonance Image Reconstruction

Matcha Naga Gayathri, Sriprabha Ramanarayanan, Mohammad Al Fahim, Rahul G S, Keerthi Ram, Mohanasankar Sivaprakasam

TL;DR

This work tackles the challenge of compressing deep cascaded MRI reconstruction models by integrating a student-aware training regime into knowledge distillation. The authors propose SFT-KD-Recon, which jointly trains a teacher with multiple student branches so the teacher learns representations tailored to the student’s capacity, followed by standard KD to distill to the smaller model. They formalize three loss terms—teacher-target reconstruction, student-target reconstruction, and student–teacher imitation—and demonstrate that this approach consistently improves reconstruction quality across five KD methods on brain and cardiac datasets under 4x and 5x undersampling. The results show the distilled student can approach teacher performance (gap reduced from 0.53 dB to 0.03 dB in PSNR) and achieve higher image fidelity metrics (HFN, VIF) compared with conventional KD, suggesting significant practical impact for deploying lighter MRI reconstruction models without sacrificing accuracy. The method is compatible with existing KD frameworks and can be integrated into clinical workflows to enable faster, resource-efficient MRI reconstruction.

Abstract

Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to become marginal, indicating possible excess model capacity. Knowledge distillation (KD) is an emerging technique to compress these models, in which a trained deep teacher network is used to distill knowledge to a smaller student network such that the student learns to mimic the behavior of the teacher. Most KD methods focus on effectively training the student with a pre-trained teacher unaware of the student model. We propose SFT-KD-Recon, a student-friendly teacher training approach along with the student as a prior step to KD to make the teacher aware of the structure and capacity of the student and enable aligning the representations of the teacher with the student. In SFT, the teacher is jointly trained with the unfolded branch configurations of the student blocks using three loss terms - teacher-reconstruction loss, student-reconstruction loss, and teacher-student imitation loss, followed by KD of the student. We perform extensive experiments for MRI acceleration in 4x and 5x under-sampling on the brain and cardiac datasets on five KD methods using the proposed approach as a prior step. We consider the DC-CNN architecture and setup teacher as D5C5 (141765 parameters), and student as D3C5 (49285 parameters), denoting a compression of 2.87:1. Results show that (i) our approach consistently improves the KD methods with improved reconstruction performance and image quality, and (ii) the student distilled using our approach is competitive with the teacher, with the performance gap reduced from 0.53 dB to 0.03 dB.

SFT-KD-Recon: Learning a Student-friendly Teacher for Knowledge Distillation in Magnetic Resonance Image Reconstruction

TL;DR

This work tackles the challenge of compressing deep cascaded MRI reconstruction models by integrating a student-aware training regime into knowledge distillation. The authors propose SFT-KD-Recon, which jointly trains a teacher with multiple student branches so the teacher learns representations tailored to the student’s capacity, followed by standard KD to distill to the smaller model. They formalize three loss terms—teacher-target reconstruction, student-target reconstruction, and student–teacher imitation—and demonstrate that this approach consistently improves reconstruction quality across five KD methods on brain and cardiac datasets under 4x and 5x undersampling. The results show the distilled student can approach teacher performance (gap reduced from 0.53 dB to 0.03 dB in PSNR) and achieve higher image fidelity metrics (HFN, VIF) compared with conventional KD, suggesting significant practical impact for deploying lighter MRI reconstruction models without sacrificing accuracy. The method is compatible with existing KD frameworks and can be integrated into clinical workflows to enable faster, resource-efficient MRI reconstruction.

Abstract

Deep cascaded architectures for magnetic resonance imaging (MRI) acceleration have shown remarkable success in providing high-quality reconstruction. However, as the number of cascades increases, the improvements in reconstruction tend to become marginal, indicating possible excess model capacity. Knowledge distillation (KD) is an emerging technique to compress these models, in which a trained deep teacher network is used to distill knowledge to a smaller student network such that the student learns to mimic the behavior of the teacher. Most KD methods focus on effectively training the student with a pre-trained teacher unaware of the student model. We propose SFT-KD-Recon, a student-friendly teacher training approach along with the student as a prior step to KD to make the teacher aware of the structure and capacity of the student and enable aligning the representations of the teacher with the student. In SFT, the teacher is jointly trained with the unfolded branch configurations of the student blocks using three loss terms - teacher-reconstruction loss, student-reconstruction loss, and teacher-student imitation loss, followed by KD of the student. We perform extensive experiments for MRI acceleration in 4x and 5x under-sampling on the brain and cardiac datasets on five KD methods using the proposed approach as a prior step. We consider the DC-CNN architecture and setup teacher as D5C5 (141765 parameters), and student as D3C5 (49285 parameters), denoting a compression of 2.87:1. Results show that (i) our approach consistently improves the KD methods with improved reconstruction performance and image quality, and (ii) the student distilled using our approach is competitive with the teacher, with the performance gap reduced from 0.53 dB to 0.03 dB.
Paper Structure (26 sections, 5 equations, 8 figures, 9 tables)

This paper contains 26 sections, 5 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Comparison between the standard KD and SFT-KD-Recon. (a) The standard KD trains teacher alone and distills knowledge to student. (b) SFT-KD-Recon trains the teacher along with the student branches and then distills effective knowledge to student. (c) SFT Vs SFT-KD-Recon, the former learns in the feature domain via residual CNN while the latter learns in the image domain via image domain CNN.
  • Figure 2: Student-Friendly training of the teacher. The teacher DC-CNN has five blocks, each having CNN with five convolution layers and DF layer, and the student DC-CNN has five blocks, each having three convolution layers and a DF layer. The teacher is trained with three loss terms - $L_{rec}^{T}$, $L_{rec}^{S}$ (blue arrows), and $L_{imit}$ (violet arrows). Note that all the blocks of the student learn initial weights except the first block during SFT training.
  • Figure 3: Visual results (from left to right): target, target inset, ZF, teacher, student, Std-KD, SFT-KD-Recon, student residue, Std-KD residue, SFT-KD-Recon residue with respect to the target, for the brain (top) and cardiac (bottom) with 4x acceleration. We note that in addition to lower reconstruction errors, the SFT-KD distilled student is able to retain finer structures better when compared to the student and Std-KD output.
  • Figure 4: (a) SSIM Box plots of KD, SFT-KD-Recon with respect to teacher and student across the brain and cardiac datasets for 4x and 5x acceleration. (b) Reconstruction loss of teacher, student, SFT-Teacher, KD, SFT-KD-Recon on the validation set for the cardiac dataset, 4x acceleration. KD and SFT-KD-Recon use AT as the distillation method.
  • Figure 5: Visual results: Top - SFT training setting (from left to right): target, target inset, ZF, teacher, student, Std. KD, SFT; Bottom - SFT-KD-Recon setting (left to right): target, target inset, ZF, teacher, student, Std. KD, SFT-KD-Recon for MRBrainS dataset with 4x acceleration factor.
  • ...and 3 more figures