Table of Contents
Fetching ...

SuperF: Neural Implicit Fields for Multi-Image Super-Resolution

Sander Riisøen Jyhne, Christian Igel, Morten Goodwin, Per-Arne Andersen, Serge Belongie, Nico Lang

TL;DR

SuperF introduces a test-time optimization approach for multi-image super-resolution that shares a single implicit neural representation across shifted LR frames while jointly optimizing sub-pixel frame alignments. By directly parameterizing affine transforms and employing a supersampling strategy, it achieves accurate high-resolution reconstructions without relying on high-resolution training data, reducing hallucinated details. The method optionally models per-pixel uncertainty with GNLL to handle occlusions and noise, and uses Fourier feature encodings to recover high-frequency details. Experimental results on synthetic satellite and handheld bursts, plus real Sentinel-2 data, show consistent PSNR/LPIPS improvements over baselines, with ablations highlighting the critical role of direct transform parameterization and supersampling. The work enables flexible MISR across domains and emphasizes practical robustness, though runtime and real-world occlusions remain areas for further refinement.

Abstract

High-resolution imagery is often hindered by limitations in sensor technology, atmospheric conditions, and costs. Such challenges occur in satellite remote sensing, but also with handheld cameras, such as our smartphones. Hence, super-resolution aims to enhance the image resolution algorithmically. Since single-image super-resolution requires solving an inverse problem, such methods must exploit strong priors, e.g. learned from high-resolution training data, or be constrained by auxiliary data, e.g. by a high-resolution guide from another modality. While qualitatively pleasing, such approaches often lead to "hallucinated" structures that do not match reality. In contrast, multi-image super-resolution (MISR) aims to improve the (optical) resolution by constraining the super-resolution process with multiple views taken with sub-pixel shifts. Here, we propose SuperF, a test-time optimization approach for MISR that leverages coordinate-based neural networks, also called neural fields. Their ability to represent continuous signals with an implicit neural representation (INR) makes them an ideal fit for the MISR task. The key characteristic of our approach is to share an INR for multiple shifted low-resolution frames and to jointly optimize the frame alignment with the INR. Our approach advances related INR baselines, adopted from burst fusion for layer separation, by directly parameterizing the sub-pixel alignment as optimizable affine transformation parameters and by optimizing via a super-sampled coordinate grid that corresponds to the output resolution. Our experiments yield compelling results on simulated bursts of satellite imagery and ground-level images from handheld cameras, with upsampling factors of up to 8. A key advantage of SuperF is that this approach does not rely on any high-resolution training data.

SuperF: Neural Implicit Fields for Multi-Image Super-Resolution

TL;DR

SuperF introduces a test-time optimization approach for multi-image super-resolution that shares a single implicit neural representation across shifted LR frames while jointly optimizing sub-pixel frame alignments. By directly parameterizing affine transforms and employing a supersampling strategy, it achieves accurate high-resolution reconstructions without relying on high-resolution training data, reducing hallucinated details. The method optionally models per-pixel uncertainty with GNLL to handle occlusions and noise, and uses Fourier feature encodings to recover high-frequency details. Experimental results on synthetic satellite and handheld bursts, plus real Sentinel-2 data, show consistent PSNR/LPIPS improvements over baselines, with ablations highlighting the critical role of direct transform parameterization and supersampling. The work enables flexible MISR across domains and emphasizes practical robustness, though runtime and real-world occlusions remain areas for further refinement.

Abstract

High-resolution imagery is often hindered by limitations in sensor technology, atmospheric conditions, and costs. Such challenges occur in satellite remote sensing, but also with handheld cameras, such as our smartphones. Hence, super-resolution aims to enhance the image resolution algorithmically. Since single-image super-resolution requires solving an inverse problem, such methods must exploit strong priors, e.g. learned from high-resolution training data, or be constrained by auxiliary data, e.g. by a high-resolution guide from another modality. While qualitatively pleasing, such approaches often lead to "hallucinated" structures that do not match reality. In contrast, multi-image super-resolution (MISR) aims to improve the (optical) resolution by constraining the super-resolution process with multiple views taken with sub-pixel shifts. Here, we propose SuperF, a test-time optimization approach for MISR that leverages coordinate-based neural networks, also called neural fields. Their ability to represent continuous signals with an implicit neural representation (INR) makes them an ideal fit for the MISR task. The key characteristic of our approach is to share an INR for multiple shifted low-resolution frames and to jointly optimize the frame alignment with the INR. Our approach advances related INR baselines, adopted from burst fusion for layer separation, by directly parameterizing the sub-pixel alignment as optimizable affine transformation parameters and by optimizing via a super-sampled coordinate grid that corresponds to the output resolution. Our experiments yield compelling results on simulated bursts of satellite imagery and ground-level images from handheld cameras, with upsampling factors of up to 8. A key advantage of SuperF is that this approach does not rely on any high-resolution training data.

Paper Structure

This paper contains 38 sections, 5 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 1: Illustration of the proposed method.SuperF achieves multi-image super-resolution by sharing an implicit neural representation (INR) across multiple low-resolution (LR) frames with sub-pixel shifts. The LR frames are aligned by jointly optimizing an affine coordinate transformation for each LR frame, together with the parameters of a coordinate-based multi-layer perceptron (MLP) that decodes the input coordinates to RGB values. Hence, leveraging the continuous characteristics of INRs for both the sub-pixel alignment in the pixel coordinate space and for representing the underlying high-resolution (HR) signal. For robustness, the proposed INR can optionally represent additional frame-specific uncertainty maps to ignore noisy pixels (e.g. clouds) in the optimization.
  • Figure 2: Qualitative comparison with upsampling factor $\times$4. From left to right, we show: one low-resolution (LR) frame, bilinear upsampling, steerable kernel regression lafenetre2023handheld, NIR nam2022neural, our SuperF approach, and the high-resolution (HR) reference. Samples are from SyntheticBurst (row 1, 2) and SatSynthBurst (row 3, 4).
  • Figure 3: Qualitative examples using real satellite images. We demonstrate that our method can align and super-resolve real satellite images from the Sentinel-2 mission by an upsampling factor of 5 using a filtered time series from Sentinel-2. Depending on the cloud cover this leads to a varying number of LR images retrieved within 3--5 months (number of images: A: 25, B:15, C:9, D:7).
  • Figure 4: Examples of the SatSynthBurst dataset (factor $\times$4). The top row shows the underlying high-resolution (HR) image. Below we show four slightly misaligned low-resolution (LR) frames.
  • Figure 5: Examples of the SyntheticBurst dataset (factor $\times$8). The top row shows the underlying high-resolution (HR) image. Below we show four slightly misaligned low-resolution (LR) frames.
  • ...and 8 more figures