SuperF: Neural Implicit Fields for Multi-Image Super-Resolution
Sander Riisøen Jyhne, Christian Igel, Morten Goodwin, Per-Arne Andersen, Serge Belongie, Nico Lang
TL;DR
SuperF introduces a test-time optimization approach for multi-image super-resolution that shares a single implicit neural representation across shifted LR frames while jointly optimizing sub-pixel frame alignments. By directly parameterizing affine transforms and employing a supersampling strategy, it achieves accurate high-resolution reconstructions without relying on high-resolution training data, reducing hallucinated details. The method optionally models per-pixel uncertainty with GNLL to handle occlusions and noise, and uses Fourier feature encodings to recover high-frequency details. Experimental results on synthetic satellite and handheld bursts, plus real Sentinel-2 data, show consistent PSNR/LPIPS improvements over baselines, with ablations highlighting the critical role of direct transform parameterization and supersampling. The work enables flexible MISR across domains and emphasizes practical robustness, though runtime and real-world occlusions remain areas for further refinement.
Abstract
High-resolution imagery is often hindered by limitations in sensor technology, atmospheric conditions, and costs. Such challenges occur in satellite remote sensing, but also with handheld cameras, such as our smartphones. Hence, super-resolution aims to enhance the image resolution algorithmically. Since single-image super-resolution requires solving an inverse problem, such methods must exploit strong priors, e.g. learned from high-resolution training data, or be constrained by auxiliary data, e.g. by a high-resolution guide from another modality. While qualitatively pleasing, such approaches often lead to "hallucinated" structures that do not match reality. In contrast, multi-image super-resolution (MISR) aims to improve the (optical) resolution by constraining the super-resolution process with multiple views taken with sub-pixel shifts. Here, we propose SuperF, a test-time optimization approach for MISR that leverages coordinate-based neural networks, also called neural fields. Their ability to represent continuous signals with an implicit neural representation (INR) makes them an ideal fit for the MISR task. The key characteristic of our approach is to share an INR for multiple shifted low-resolution frames and to jointly optimize the frame alignment with the INR. Our approach advances related INR baselines, adopted from burst fusion for layer separation, by directly parameterizing the sub-pixel alignment as optimizable affine transformation parameters and by optimizing via a super-sampled coordinate grid that corresponds to the output resolution. Our experiments yield compelling results on simulated bursts of satellite imagery and ground-level images from handheld cameras, with upsampling factors of up to 8. A key advantage of SuperF is that this approach does not rely on any high-resolution training data.
