Table of Contents
Fetching ...

A Study in Dataset Distillation for Image Super-Resolution

Tobias Dietz, Brian B. Moser, Tobias Nauen, Federico Raue, Stanislav Frolov, Andreas Dengel

TL;DR

This paper tackles data efficiency in image super-resolution by applying dataset distillation to a regression task. It adapts gradient matching from Dataset Condensation to SR by replacing the classification loss with reconstruction loss $\ell_{SR}$ and introduces pseudo-label grouping and latent-space distillation using a pretrained generator such as StyleGAN-XL. The experiments show that a distilled dataset occupying about $8.88\%$ of the original size can train SR models with near full-data fidelity, with latent-space distillation outperforming pixel-space methods and generalizing across $2\times$ and $4\times$ upscaling. The work provides a practical path toward memory- and compute-efficient SR pipelines and offers foundational insights for generative data learning in restoration tasks.

Abstract

Dataset distillation aims to compress large datasets into compact yet highly informative subsets that preserve the training behavior of the original data. While this concept has gained traction in classification, its potential for image Super-Resolution (SR) remains largely untapped. In this work, we conduct the first systematic study of dataset distillation for SR, evaluating both pixel- and latent-space formulations. We show that a distilled dataset, occupying only 8.88% of the original size, can train SR models that retain nearly the same reconstruction fidelity as those trained on full datasets. Furthermore, we analyze how initialization strategies and distillation objectives affect efficiency, convergence, and visual quality. Our findings highlight the feasibility of SR dataset distillation and establish foundational insights for memory- and compute-efficient generative restoration models.

A Study in Dataset Distillation for Image Super-Resolution

TL;DR

This paper tackles data efficiency in image super-resolution by applying dataset distillation to a regression task. It adapts gradient matching from Dataset Condensation to SR by replacing the classification loss with reconstruction loss and introduces pseudo-label grouping and latent-space distillation using a pretrained generator such as StyleGAN-XL. The experiments show that a distilled dataset occupying about of the original size can train SR models with near full-data fidelity, with latent-space distillation outperforming pixel-space methods and generalizing across and upscaling. The work provides a practical path toward memory- and compute-efficient SR pipelines and offers foundational insights for generative data learning in restoration tasks.

Abstract

Dataset distillation aims to compress large datasets into compact yet highly informative subsets that preserve the training behavior of the original data. While this concept has gained traction in classification, its potential for image Super-Resolution (SR) remains largely untapped. In this work, we conduct the first systematic study of dataset distillation for SR, evaluating both pixel- and latent-space formulations. We show that a distilled dataset, occupying only 8.88% of the original size, can train SR models that retain nearly the same reconstruction fidelity as those trained on full datasets. Furthermore, we analyze how initialization strategies and distillation objectives affect efficiency, convergence, and visual quality. Our findings highlight the feasibility of SR dataset distillation and establish foundational insights for memory- and compute-efficient generative restoration models.

Paper Structure

This paper contains 17 sections, 6 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Overview of Dataset Condensation (DC) adapted for image SR. Our framework integrates a generative model to enable latent-space distillation and distills large SR datasets into smaller, highly representative synthetic samples.
  • Figure 2: Illustration of the dataset preparation process. HR images are divided into overlapping sub-images using a stride to preserve edge details. During training, random patches are extracted from these sub-images to construct training batches. We will exploit this dataset preparation process to associate sub-images with their full image, effectively distilling the full size image into a sub-image.
  • Figure 3: Illustrated evolution of synthetic high-resolution samples over distillation steps. Each row displays how the distillation process optimizes synthetic samples for different experiments over 1,000 distillation iterations. With the first row, it becomes clear that a good initialization is crucial for distilling SR images, as already quantified in our earlier experiments. Also, it becomes apparent that artifacts do not suddenly appear but develop gradually over distillation time. Furthermore, for latent distillation, only minor changes happen in later iterations, indicating that fewer iterations might suffice for a similarly well-performing synthetic dataset.
  • Figure 4: Comparison of qualitative results of the images generated by evaluating the different distillation experiments. Qualitatively, there is no significant difference between SR models trained on real or distilled images.