LocalSR: Image Super-Resolution in Local Region
Bo Ji, Angela Yao
TL;DR
This paper defines LocalSR, a task to restore only a designated ROI within a low-resolution image to reduce computation while preserving detail in regions of interest. It introduces CLSR, a context-based local SR framework with a base ROI branch and two context modules: a Global Context Module (GCM) for global, similarity-based context retrieval and a Proximity Integration Module (PIM) for nearby, proximal context, enabling robust ROI restoration. GCM uses a patch-based cross-attention mechanism with downsampled queries/keys/values and a distance-aware bias, while PIM crops ROI-aligned features from context and fuses them with ROI features; down/up samplers are zero-initialized to start as bilinear interpolation for stability. Training relies on patch-based LR/HR pairs with a dual loss that supervises the ROI and the surrounding context, and inference processes ROIs patch-by-patch; experiments across CNN and Transformer SR backbones show improved ROI PSNR/SSIM and substantial FLOPs savings compared with pre-cropping and post-cropping baselines, particularly for small ROIs.
Abstract
Standard single-image super-resolution (SR) upsamples and restores entire images. Yet several real-world applications require higher resolutions only in specific regions, such as license plates or faces, making the super-resolution of the entire image, along with the associated memory and computational cost, unnecessary. We propose a novel task, called LocalSR, to restore only local regions of the low-resolution image. For this problem setting, we propose a context-based local super-resolution (CLSR) to super-resolve only specified regions of interest (ROI) while leveraging the entire image as context. Our method uses three parallel processing modules: a base module for super-resolving the ROI, a global context module for gathering helpful features from across the image, and a proximity integration module for concentrating on areas surrounding the ROI, progressively propagating features from distant pixels to the target region. Experimental results indicate that our approach, with its reduced low complexity, outperforms variants that focus exclusively on the ROI.
