Table of Contents
Fetching ...

MFSR-GAN: Multi-Frame Super-Resolution with Handheld Motion Modeling

Fadeel Sher Khan, Joshua Ebenezer, Hamid Sheikh, Seok-Jun Lee

TL;DR

This work tackles real-world handheld multi-frame super-resolution by bridging a realism gap in training data and improving frame fusion. It introduces a synthetic data engine that preserves sensor-specific noise and temporally correlated handheld motion by warping high-resolution static captures with real handheld homographies and nearest-neighbor downsampling, paired with an 8-frame RAW-to-RGB MFSR-GAN. The network emphasizes a base frame through Reference Difference Computation and deformable-alignment-based fusion across multiple scales, aided by RRDB-based reconstruction and a relativistic GAN framework for perceptual quality. Experiments on both synthetic and real handheld bursts show sharper, more realistic reconstructions than prior methods, highlighting the practical potential for improved smartphone imaging under challenging conditions.

Abstract

Smartphone cameras have become ubiquitous imaging tools, yet their small sensors and compact optics often limit spatial resolution and introduce distortions. Combining information from multiple low-resolution (LR) frames to produce a high-resolution (HR) image has been explored to overcome the inherent limitations of smartphone cameras. Despite the promise of multi-frame super-resolution (MFSR), current approaches are hindered by datasets that fail to capture the characteristic noise and motion patterns found in real-world handheld burst images. In this work, we address this gap by introducing a novel synthetic data engine that uses multi-exposure static images to synthesize LR-HR training pairs while preserving sensor-specific noise characteristics and image motion found during handheld burst photography. We also propose MFSR-GAN: a multi-scale RAW-to-RGB network for MFSR. Compared to prior approaches, MFSR-GAN emphasizes a "base frame" throughout its architecture to mitigate artifacts. Experimental results on both synthetic and real data demonstrates that MFSR-GAN trained with our synthetic engine yields sharper, more realistic reconstructions than existing methods for real-world MFSR.

MFSR-GAN: Multi-Frame Super-Resolution with Handheld Motion Modeling

TL;DR

This work tackles real-world handheld multi-frame super-resolution by bridging a realism gap in training data and improving frame fusion. It introduces a synthetic data engine that preserves sensor-specific noise and temporally correlated handheld motion by warping high-resolution static captures with real handheld homographies and nearest-neighbor downsampling, paired with an 8-frame RAW-to-RGB MFSR-GAN. The network emphasizes a base frame through Reference Difference Computation and deformable-alignment-based fusion across multiple scales, aided by RRDB-based reconstruction and a relativistic GAN framework for perceptual quality. Experiments on both synthetic and real handheld bursts show sharper, more realistic reconstructions than prior methods, highlighting the practical potential for improved smartphone imaging under challenging conditions.

Abstract

Smartphone cameras have become ubiquitous imaging tools, yet their small sensors and compact optics often limit spatial resolution and introduce distortions. Combining information from multiple low-resolution (LR) frames to produce a high-resolution (HR) image has been explored to overcome the inherent limitations of smartphone cameras. Despite the promise of multi-frame super-resolution (MFSR), current approaches are hindered by datasets that fail to capture the characteristic noise and motion patterns found in real-world handheld burst images. In this work, we address this gap by introducing a novel synthetic data engine that uses multi-exposure static images to synthesize LR-HR training pairs while preserving sensor-specific noise characteristics and image motion found during handheld burst photography. We also propose MFSR-GAN: a multi-scale RAW-to-RGB network for MFSR. Compared to prior approaches, MFSR-GAN emphasizes a "base frame" throughout its architecture to mitigate artifacts. Experimental results on both synthetic and real data demonstrates that MFSR-GAN trained with our synthetic engine yields sharper, more realistic reconstructions than existing methods for real-world MFSR.

Paper Structure

This paper contains 17 sections, 9 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: We train a novel multi-frame RAW to RGB super-resolution network on a novel synthetic data engine that generates sharper and more detailed images than the existing state-of-the-art model dudhane_burstormer_2023 and synthetic dataset ignatov_replacing_2020 when tested on real handheld smartphone captures.
  • Figure 2: Uniform distribution motion
  • Figure 3: Real handheld motion
  • Figure 5: Overview of MFSR-GAN. Descriptions of modules in Section \ref{['sec:mfsr-net']} and further details in supplementary materials.
  • Figure 6: Multi-frame super-resolution qualitative results for real handheld burst photography under low-light conditions for MFSR-GAN and state-of-the-art Burstormer dudhane_burstormer_2023. Both models fully trained using our synthetic data engine.
  • ...and 5 more figures