Table of Contents
Fetching ...

MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References

Lukas Bösiger, Mihai Dusmanu, Marc Pollefeys, Zuria Bauer

TL;DR

MaRINeR tackles artifact-laden novel-view renderings from imperfect 3D reconstructions by leveraging a nearby reference image to transfer contextual details through a multi-level feature matching and fusion framework. It employs an encoder–decoder architecture with a Matching and Extraction Module (MEM) to establish correspondences and a decoder with Spatial Adaptation Modules (SAM) and Dual Residual Aggregation Modules (DRAM) to fuse warped reference features into the rendering, guided by a loss $L = \lambda_{\text{rec}} L_{\text{rec}} + \lambda_{\text{per}} L_{\text{per}} + \lambda_{\text{adv}} L_{\text{adv}}$. Evaluations on the LaMAR dataset show MaRINeR surpasses RefSR and style-transfer baselines in PSNR, SSIM, LPIPS, and ERQA, while generalizing to unseen scenes and benefiting downstream tasks like pseudo-ground-truth validation and NeRF post-processing. The approach narrows the render-to-real gap, enables automated quality checks in mixed-reality pipelines, and provides a practical tool for enhancing synthetic data and neural renderings across diverse reconstruction modalities.

Abstract

Rendering realistic images from 3D reconstruction is an essential task of many Computer Vision and Robotics pipelines, notably for mixed-reality applications as well as training autonomous agents in simulated environments. However, the quality of novel views heavily depends of the source reconstruction which is often imperfect due to noisy or missing geometry and appearance. Inspired by the recent success of reference-based super-resolution networks, we propose MaRINeR, a refinement method that leverages information of a nearby mapping image to improve the rendering of a target viewpoint. We first establish matches between the raw rendered image of the scene geometry from the target viewpoint and the nearby reference based on deep features, followed by hierarchical detail transfer. We show improved renderings in quantitative metrics and qualitative examples from both explicit and implicit scene representations. We further employ our method on the downstream tasks of pseudo-ground-truth validation, synthetic data enhancement and detail recovery for renderings of reduced 3D reconstructions.

MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References

TL;DR

MaRINeR tackles artifact-laden novel-view renderings from imperfect 3D reconstructions by leveraging a nearby reference image to transfer contextual details through a multi-level feature matching and fusion framework. It employs an encoder–decoder architecture with a Matching and Extraction Module (MEM) to establish correspondences and a decoder with Spatial Adaptation Modules (SAM) and Dual Residual Aggregation Modules (DRAM) to fuse warped reference features into the rendering, guided by a loss . Evaluations on the LaMAR dataset show MaRINeR surpasses RefSR and style-transfer baselines in PSNR, SSIM, LPIPS, and ERQA, while generalizing to unseen scenes and benefiting downstream tasks like pseudo-ground-truth validation and NeRF post-processing. The approach narrows the render-to-real gap, enables automated quality checks in mixed-reality pipelines, and provides a practical tool for enhancing synthetic data and neural renderings across diverse reconstruction modalities.

Abstract

Rendering realistic images from 3D reconstruction is an essential task of many Computer Vision and Robotics pipelines, notably for mixed-reality applications as well as training autonomous agents in simulated environments. However, the quality of novel views heavily depends of the source reconstruction which is often imperfect due to noisy or missing geometry and appearance. Inspired by the recent success of reference-based super-resolution networks, we propose MaRINeR, a refinement method that leverages information of a nearby mapping image to improve the rendering of a target viewpoint. We first establish matches between the raw rendered image of the scene geometry from the target viewpoint and the nearby reference based on deep features, followed by hierarchical detail transfer. We show improved renderings in quantitative metrics and qualitative examples from both explicit and implicit scene representations. We further employ our method on the downstream tasks of pseudo-ground-truth validation, synthetic data enhancement and detail recovery for renderings of reduced 3D reconstructions.
Paper Structure (14 sections, 12 equations, 26 figures, 5 tables)

This paper contains 14 sections, 12 equations, 26 figures, 5 tables.

Figures (26)

  • Figure 1: We introduce MaRINeR: a pipeline taking as input a novel-view obtained from a 3D reconstruction exhibiting geometric and / or appearance artifacts and inaccuracies as well as a nearby reference used during the reconstruction process, and outputting an enhanced version of the novel-view through feature matching and transfer.
  • Figure 2: Robustness of MaRINeR. Our model recovers missing parts that appear due to rendering artifacts a, b. It adopts the illumination from the reference c, is device agnostic generalizing to gray-scale images d. The model enhances renderings of low triangle meshes e and also improves the rendering even if the reference has little content in common f. It can be applied to unseen scenes such as 12 Scenes 12scenesg or Aachen Day-Night AACHENh without retraining.
  • Figure 3: MaRINeR architecture. The learned features of the encoder are used to for correspondence matching and warping of the reference features. They are fused with the rendering features to create a enhanced rendering, which is iteratively refined.
  • Figure 4: Architecture of the decoder. We fuse the rendering and warped reference features using SAM MASASR, DRAM MASASR and residual blocks ResBlock.
  • Figure 5: Common dataset challenges. There can be different objects present between rendering and GT, some of which can be artifacts. The illumination can also be different because of day time or seasonal changes.
  • ...and 21 more figures