Instance Data Condensation for Image Super-Resolution
Tianhao Peng, Ho Man Kwan, Yuxuan Jiang, Ge Gao, Fan Zhang, Xiaozhong Xu, Shan Liu, David Bull
TL;DR
This work tackles the data inefficiency of image super-resolution by proposing Instance Data Condensation (IDC), an ISR-specific condensation framework that synthesizes a small yet informative LR patch set through Random Local Fourier Features and Multi-level Feature Distribution Matching. IDC operates at the instance level, enabling condensation with a ratio $r$ (e.g., $r=0.1$) that, on DIV2K, yields synthetic data which train ISR models to achieve comparable or even superior PSNR/SSIM on standard benchmarks while exhibiting greater training stability. The method introduces a two-stage pipeline: (i) learn synthetic LR patches per image via distribution matching against real patches using a teacher model, and (ii) up-sample these patches to HR with the teacher, forming the condensed dataset; and (iii) uses Random Local Fourier Features to capture high-frequency, local texture details, enabling faithful distribution alignment. Empirically, IDC outperforms several core-set and pruning baselines at 10% data and, in many cases, surpasses the performance of the full dataset, marking a first for ISR data condensation to achieve such results with a small synthetic corpus.
Abstract
Deep learning based image Super-Resolution (ISR) relies on large training datasets to optimize model generalization; this requires substantial computational and storage resources during training. While dataset condensation has shown potential in improving data efficiency and privacy for high-level computer vision tasks, it has not yet been fully exploited for ISR. In this paper, we propose a novel Instance Data Condensation (IDC) framework specifically for ISR, which achieves instance-level data condensation through Random Local Fourier Feature Extraction and Multi-level Feature Distribution Matching. This aims to optimize feature distributions at both global and local levels and obtain high-quality synthesized training content with fine detail. This framework has been utilized to condense the most commonly used training dataset for ISR, DIV2K, with a 10% condensation rate. The resulting synthetic dataset offers comparable or (in certain cases) even better performance compared to the original full dataset and excellent training stability when used to train various popular ISR models. To the best of our knowledge, this is the first time that a condensed/synthetic dataset (with a 10% data volume) has demonstrated such performance. The source code and the synthetic dataset have been made available at https://github.com/.
