Table of Contents
Fetching ...

Instance Data Condensation for Image Super-Resolution

Tianhao Peng, Ho Man Kwan, Yuxuan Jiang, Ge Gao, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull

TL;DR

This work tackles the data inefficiency of image super-resolution by proposing Instance Data Condensation (IDC), an ISR-specific condensation framework that synthesizes a small yet informative LR patch set through Random Local Fourier Features and Multi-level Feature Distribution Matching. IDC operates at the instance level, enabling condensation with a ratio $r$ (e.g., $r=0.1$) that, on DIV2K, yields synthetic data which train ISR models to achieve comparable or even superior PSNR/SSIM on standard benchmarks while exhibiting greater training stability. The method introduces a two-stage pipeline: (i) learn synthetic LR patches per image via distribution matching against real patches using a teacher model, and (ii) up-sample these patches to HR with the teacher, forming the condensed dataset; and (iii) uses Random Local Fourier Features to capture high-frequency, local texture details, enabling faithful distribution alignment. Empirically, IDC outperforms several core-set and pruning baselines at 10% data and, in many cases, surpasses the performance of the full dataset, marking a first for ISR data condensation to achieve such results with a small synthetic corpus.

Abstract

Deep learning based image Super-Resolution (ISR) relies on large training datasets to optimize model generalization; this requires substantial computational and storage resources during training. While dataset condensation has shown potential in improving data efficiency and privacy for high-level computer vision tasks, it has not yet been fully exploited for ISR. In this paper, we propose a novel Instance Data Condensation (IDC) framework specifically for ISR, which achieves instance-level data condensation through Random Local Fourier Feature Extraction and Multi-level Feature Distribution Matching. This aims to optimize feature distributions at both global and local levels and obtain high-quality synthesized training content with fine detail. This framework has been utilized to condense the most commonly used training dataset for ISR, DIV2K, with a 10% condensation rate. The resulting synthetic dataset offers comparable or (in certain cases) even better performance compared to the original full dataset and excellent training stability when used to train various popular ISR models. To the best of our knowledge, this is the first time that a condensed/synthetic dataset (with a 10% data volume) has demonstrated such performance. The source code and the synthetic dataset have been made available at https://github.com/.

Instance Data Condensation for Image Super-Resolution

TL;DR

This work tackles the data inefficiency of image super-resolution by proposing Instance Data Condensation (IDC), an ISR-specific condensation framework that synthesizes a small yet informative LR patch set through Random Local Fourier Features and Multi-level Feature Distribution Matching. IDC operates at the instance level, enabling condensation with a ratio (e.g., ) that, on DIV2K, yields synthetic data which train ISR models to achieve comparable or even superior PSNR/SSIM on standard benchmarks while exhibiting greater training stability. The method introduces a two-stage pipeline: (i) learn synthetic LR patches per image via distribution matching against real patches using a teacher model, and (ii) up-sample these patches to HR with the teacher, forming the condensed dataset; and (iii) uses Random Local Fourier Features to capture high-frequency, local texture details, enabling faithful distribution alignment. Empirically, IDC outperforms several core-set and pruning baselines at 10% data and, in many cases, surpasses the performance of the full dataset, marking a first for ISR data condensation to achieve such results with a small synthetic corpus.

Abstract

Deep learning based image Super-Resolution (ISR) relies on large training datasets to optimize model generalization; this requires substantial computational and storage resources during training. While dataset condensation has shown potential in improving data efficiency and privacy for high-level computer vision tasks, it has not yet been fully exploited for ISR. In this paper, we propose a novel Instance Data Condensation (IDC) framework specifically for ISR, which achieves instance-level data condensation through Random Local Fourier Feature Extraction and Multi-level Feature Distribution Matching. This aims to optimize feature distributions at both global and local levels and obtain high-quality synthesized training content with fine detail. This framework has been utilized to condense the most commonly used training dataset for ISR, DIV2K, with a 10% condensation rate. The resulting synthetic dataset offers comparable or (in certain cases) even better performance compared to the original full dataset and excellent training stability when used to train various popular ISR models. To the best of our knowledge, this is the first time that a condensed/synthetic dataset (with a 10% data volume) has demonstrated such performance. The source code and the synthetic dataset have been made available at https://github.com/.

Paper Structure

This paper contains 10 sections, 8 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: (Left): Visual comparison between synthetic patches generated by our IDC framework and those selected/synthesized by Random Selection, DCSR ding2023not, and NCFD (v1 in our ablation study) wang2025NCFM. IDC’s patches contain more diverse information than selection-based methods, and data condensation techniques like NCFD (designed for high-level vision) fail to produce meaningful results. (Right): Quantitative results show that an IDC-synthesized dataset (10% volume) can outperform the full DIV2K dataset when training ISR models.
  • Figure 2: Illustration of the proposed Instance Data Condensation (IDC) framework.
  • Figure 3: (Left): Visual Examples of our synthetic images. (Right): Validation trajectory on the Set14.
  • Figure 4: Starting from the original NCFD wang2025NCFM (v1), the visual evolution for adding each contribution.