Table of Contents
Fetching ...

Deep 3D World Models for Multi-Image Super-Resolution Beyond Optical Flow

Luca Savant Aira, Diego Valsesia, Andrea Bordone Molini, Giulia Fracastoro, Enrico Magli, Andrea Mirabile

TL;DR

This work tackles multi-image super-resolution (MISR) under arbitrary camera poses and large disparities, where optical-flow-based methods struggle. It introduces EpiMISR, a NeRF-inspired, epipolar-geometry–driven fusion framework that processes radiance feature fields with cascaded transformers to fuse information from multiple LR views for SR of a target view. Across DTU, Google Scanned Objects, and LLFF datasets, EpiMISR delivers substantial PSNR improvements over state-of-the-art MISR baselines and generalizes to unseen domains, while offering interpretable depth cues through ray attention. The approach supports any number of views and shows robustness to pose perturbations, with future work aimed at enhancing pose robustness and modeling more complex degradations beyond the pinhole model.

Abstract

Multi-image super-resolution (MISR) allows to increase the spatial resolution of a low-resolution (LR) acquisition by combining multiple images carrying complementary information in the form of sub-pixel offsets in the scene sampling, and can be significantly more effective than its single-image counterpart. Its main difficulty lies in accurately registering and fusing the multi-image information. Currently studied settings, such as burst photography, typically involve assumptions of small geometric disparity between the LR images and rely on optical flow for image registration. We study a MISR method that can increase the resolution of sets of images acquired with arbitrary, and potentially wildly different, camera positions and orientations, generalizing the currently studied MISR settings. Our proposed model, called EpiMISR, moves away from optical flow and explicitly uses the epipolar geometry of the acquisition process, together with transformer-based processing of radiance feature fields to substantially improve over state-of-the-art MISR methods in presence of large disparities in the LR images.

Deep 3D World Models for Multi-Image Super-Resolution Beyond Optical Flow

TL;DR

This work tackles multi-image super-resolution (MISR) under arbitrary camera poses and large disparities, where optical-flow-based methods struggle. It introduces EpiMISR, a NeRF-inspired, epipolar-geometry–driven fusion framework that processes radiance feature fields with cascaded transformers to fuse information from multiple LR views for SR of a target view. Across DTU, Google Scanned Objects, and LLFF datasets, EpiMISR delivers substantial PSNR improvements over state-of-the-art MISR baselines and generalizes to unseen domains, while offering interpretable depth cues through ray attention. The approach supports any number of views and shows robustness to pose perturbations, with future work aimed at enhancing pose robustness and modeling more complex degradations beyond the pinhole model.

Abstract

Multi-image super-resolution (MISR) allows to increase the spatial resolution of a low-resolution (LR) acquisition by combining multiple images carrying complementary information in the form of sub-pixel offsets in the scene sampling, and can be significantly more effective than its single-image counterpart. Its main difficulty lies in accurately registering and fusing the multi-image information. Currently studied settings, such as burst photography, typically involve assumptions of small geometric disparity between the LR images and rely on optical flow for image registration. We study a MISR method that can increase the resolution of sets of images acquired with arbitrary, and potentially wildly different, camera positions and orientations, generalizing the currently studied MISR settings. Our proposed model, called EpiMISR, moves away from optical flow and explicitly uses the epipolar geometry of the acquisition process, together with transformer-based processing of radiance feature fields to substantially improve over state-of-the-art MISR methods in presence of large disparities in the LR images.
Paper Structure (21 sections, 6 equations, 8 figures, 5 tables)

This paper contains 21 sections, 6 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: EpiMISR Architecture. From the LR target view and the extra views super-resolved features are obtained by any single-image SR network (SISR-FE), sampled along epipolar lines associated to pixels in the target view (CAP) and fused (MIFF) to produce a residual correction to single-image SR.
  • Figure 2: DTU scene $3$ with $4\times$ scale factor. From left to right: LR nearest neighbours interpolation (19.31 dB), NeRF-SR (19.75 dB), BSRT (23.60 dB), EpiMISR (24.43 dB), HR ground truth.
  • Figure 3: PSNR with respect to $V$ and $P$.
  • Figure 4: An example of depth map generation.
  • Figure 5: ECDF of the PSNR improvements of EpiMISR with respect to BSRT on the test split of the DTU dataset.
  • ...and 3 more figures