MaRINeR: Enhancing Novel Views by Matching Rendered Images with Nearby References
Lukas Bösiger, Mihai Dusmanu, Marc Pollefeys, Zuria Bauer
TL;DR
MaRINeR tackles artifact-laden novel-view renderings from imperfect 3D reconstructions by leveraging a nearby reference image to transfer contextual details through a multi-level feature matching and fusion framework. It employs an encoder–decoder architecture with a Matching and Extraction Module (MEM) to establish correspondences and a decoder with Spatial Adaptation Modules (SAM) and Dual Residual Aggregation Modules (DRAM) to fuse warped reference features into the rendering, guided by a loss $L = \lambda_{\text{rec}} L_{\text{rec}} + \lambda_{\text{per}} L_{\text{per}} + \lambda_{\text{adv}} L_{\text{adv}}$. Evaluations on the LaMAR dataset show MaRINeR surpasses RefSR and style-transfer baselines in PSNR, SSIM, LPIPS, and ERQA, while generalizing to unseen scenes and benefiting downstream tasks like pseudo-ground-truth validation and NeRF post-processing. The approach narrows the render-to-real gap, enables automated quality checks in mixed-reality pipelines, and provides a practical tool for enhancing synthetic data and neural renderings across diverse reconstruction modalities.
Abstract
Rendering realistic images from 3D reconstruction is an essential task of many Computer Vision and Robotics pipelines, notably for mixed-reality applications as well as training autonomous agents in simulated environments. However, the quality of novel views heavily depends of the source reconstruction which is often imperfect due to noisy or missing geometry and appearance. Inspired by the recent success of reference-based super-resolution networks, we propose MaRINeR, a refinement method that leverages information of a nearby mapping image to improve the rendering of a target viewpoint. We first establish matches between the raw rendered image of the scene geometry from the target viewpoint and the nearby reference based on deep features, followed by hierarchical detail transfer. We show improved renderings in quantitative metrics and qualitative examples from both explicit and implicit scene representations. We further employ our method on the downstream tasks of pseudo-ground-truth validation, synthetic data enhancement and detail recovery for renderings of reduced 3D reconstructions.
