Table of Contents
Fetching ...

Low-Level Dataset Distillation for Medical Image Enhancement

Fengzhi Xu, Ziyuan Yang, Mengyu Sun, Joey Tianyi Zhou, Yi Zhang

TL;DR

This paper tackles the high costs of large medical image datasets required for pixel-level enhancement by introducing low-level dataset distillation (DD). It uses a shared anatomical prior from a representative patient to initialize distilled data and a Structure-Preserving Personalized Generation (SPG) module to inject patient-specific information, while enforcing pixel-level fidelity through a pixel-level preservation step. Patient-aware gradient matching aligns learning trajectories between distilled and real data on a per-patient basis, enabling privacy-preserving sharing of condensed training data that remains useful across modalities (CT and MRI) and tasks (super-resolution and restoration). Experiments demonstrate that the proposed method achieves near full-data performance with dramatically reduced storage, and generalizes across architectures and large-scale settings, highlighting its practical impact for privacy-conscious medical image enhancement.

Abstract

Medical image enhancement is clinically valuable, but existing methods require large-scale datasets to learn complex pixel-level mappings. However, the substantial training and storage costs associated with these datasets hinder their practical deployment. While dataset distillation (DD) can alleviate these burdens, existing methods mainly target high-level tasks, where multiple samples share the same label. This many-to-one mapping allows distilled data to capture shared semantics and achieve information compression. In contrast, low-level tasks involve a many-to-many mapping that requires pixel-level fidelity, making low-level DD an underdetermined problem, as a small distilled dataset cannot fully constrain the dense pixel-level mappings. To address this, we propose the first low-level DD method for medical image enhancement. We first leverage anatomical similarities across patients to construct the shared anatomical prior based on a representative patient, which serves as the initialization for the distilled data of different patients. This prior is then personalized for each patient using a Structure-Preserving Personalized Generation (SPG) module, which integrates patient-specific anatomical information into the distilled dataset while preserving pixel-level fidelity. For different low-level tasks, the distilled data is used to construct task-specific high- and low-quality training pairs. Patient-specific knowledge is injected into the distilled data by aligning the gradients computed from networks trained on the distilled pairs with those from the corresponding patient's raw data. Notably, downstream users cannot access raw patient data. Instead, only a distilled dataset containing abstract training information is shared, which excludes patient-specific details and thus preserves privacy.

Low-Level Dataset Distillation for Medical Image Enhancement

TL;DR

This paper tackles the high costs of large medical image datasets required for pixel-level enhancement by introducing low-level dataset distillation (DD). It uses a shared anatomical prior from a representative patient to initialize distilled data and a Structure-Preserving Personalized Generation (SPG) module to inject patient-specific information, while enforcing pixel-level fidelity through a pixel-level preservation step. Patient-aware gradient matching aligns learning trajectories between distilled and real data on a per-patient basis, enabling privacy-preserving sharing of condensed training data that remains useful across modalities (CT and MRI) and tasks (super-resolution and restoration). Experiments demonstrate that the proposed method achieves near full-data performance with dramatically reduced storage, and generalizes across architectures and large-scale settings, highlighting its practical impact for privacy-conscious medical image enhancement.

Abstract

Medical image enhancement is clinically valuable, but existing methods require large-scale datasets to learn complex pixel-level mappings. However, the substantial training and storage costs associated with these datasets hinder their practical deployment. While dataset distillation (DD) can alleviate these burdens, existing methods mainly target high-level tasks, where multiple samples share the same label. This many-to-one mapping allows distilled data to capture shared semantics and achieve information compression. In contrast, low-level tasks involve a many-to-many mapping that requires pixel-level fidelity, making low-level DD an underdetermined problem, as a small distilled dataset cannot fully constrain the dense pixel-level mappings. To address this, we propose the first low-level DD method for medical image enhancement. We first leverage anatomical similarities across patients to construct the shared anatomical prior based on a representative patient, which serves as the initialization for the distilled data of different patients. This prior is then personalized for each patient using a Structure-Preserving Personalized Generation (SPG) module, which integrates patient-specific anatomical information into the distilled dataset while preserving pixel-level fidelity. For different low-level tasks, the distilled data is used to construct task-specific high- and low-quality training pairs. Patient-specific knowledge is injected into the distilled data by aligning the gradients computed from networks trained on the distilled pairs with those from the corresponding patient's raw data. Notably, downstream users cannot access raw patient data. Instead, only a distilled dataset containing abstract training information is shared, which excludes patient-specific details and thus preserves privacy.

Paper Structure

This paper contains 16 sections, 9 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: The comparison with the high-level and low-level tasks.
  • Figure 2: The overview of our proposed method.
  • Figure 3: t-SNE visualization of gradient data from different patients. Different colors denote different patients.
  • Figure 4: Qualitative super-resolution results of different algorithms with CT and MRI modality. The display window for the first row is [-950, 50] HU, while for the second row is [-290, 310] HU.
  • Figure 5: Qualitative results of different algorithms.
  • ...and 3 more figures