Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

Sunwoo Cho; Yejin Jung; Nam Ik Cho; Jae Woong Soh

Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

Sunwoo Cho, Yejin Jung, Nam Ik Cho, Jae Woong Soh

TL;DR

This work tackles the data-hungry nature of single image super-resolution (SISR) by introducing a dataset distillation method that requires neither class labels nor pre-trained SR models. It combines patch-level informativity (via PSNR_{bic} and CLIP-based clustering) with a Latent Diffusion Model fine-tuned under a Minimax loss that includes $L_{simple}$, $L_r$, $L_d$, and $L_{SR}$, then directly samples distilled HR-LR pairs to train SR models. The approach delivers state-of-the-art data efficiency on OST, achieving near-parity with full-data baselines across multiple SR architectures while using as little as $0.68\%$ of the original data and substantially reducing training time (about 5 hours total). Its cross-architecture effectiveness and elimination of dependency on pre-trained SR models make it practically impactful for scalable SR training with limited labeled data and compute.

Abstract

Training deep neural networks has become increasingly demanding, requiring large datasets and significant computational resources, especially as model complexity advances. Data distillation methods, which aim to improve data efficiency, have emerged as promising solutions to this challenge. In the field of single image super-resolution (SISR), the reliance on large training datasets highlights the importance of these techniques. Recently, a generative adversarial network (GAN) inversion-based data distillation framework for SR was proposed, showing potential for better data utilization. However, the current method depends heavily on pre-trained SR networks and class-specific information, limiting its generalizability and applicability. To address these issues, we introduce a new data distillation approach for image SR that does not need class labels or pre-trained SR models. In particular, we first extract high-gradient patches and categorize images based on CLIP features, then fine-tune a diffusion model on the selected patches to learn their distribution and synthesize distilled training images. Experimental results show that our method achieves state-of-the-art performance while using significantly less training data and requiring less computational time. Specifically, when we train a baseline Transformer model for SR with only 0.68\% of the original dataset, the performance drop is just 0.3 dB. In this case, diffusion model fine-tuning takes 4 hours, and SR model training completes within 1 hour, much shorter than the 11-hour training time with the full dataset.

Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

TL;DR

Abstract

Dataset Distillation for Super-Resolution without Class Labels and Pre-trained Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)