$R^2$-Mesh: Reinforcement Learning Powered Mesh Reconstruction via Geometry and Appearance Refinement
Haoyang Wang, Liming Liu, Quanlu Jia, Jiangkai Wu, Haodan Zhang, Peiheng Wang, Xinggong Zhang
TL;DR
This work addresses high-fidelity mesh reconstruction from multi-view images by coupling NeRF-based initialization with online reinforcement learning-guided viewpoint enrichment and differentiable mesh refinement. Stage 1 uses NeRF to generate a coarse $SDF$ and view-dependent appearance; Stage 2 employs an online UCB strategy to select NeRF-rendered viewpoints and jointly refine geometry and appearance through differentiable mesh extraction and rasterization, exporting the final mesh via the NeRF2Mesh workflow. The key contributions are a flexible two-stage refinement that updates both vertex positions and connectivity, an online UCB-based viewpoint enrichment technique that boosts rendering quality, and strong empirical results on NeRF-synthetic scenes showing improvements in both $CD$ and perceptual metrics such as $PSNR$, $SSIM$, and $LPIPS$. This approach advances high-fidelity geometry and rendering, with broad applicability to NeRF-based mesh reconstruction frameworks and potential impact on VR, medical imaging, and robotics workflows.
Abstract
Mesh reconstruction based on Neural Radiance Fields (NeRF) is popular in a variety of applications such as computer graphics, virtual reality, and medical imaging due to its efficiency in handling complex geometric structures and facilitating real-time rendering. However, existing works often fail to capture fine geometric details accurately and struggle with optimizing rendering quality. To address these challenges, we propose a novel algorithm that progressively generates and optimizes meshes from multi-view images. Our approach initiates with the training of a NeRF model to establish an initial Signed Distance Field (SDF) and a view-dependent appearance field. Subsequently, we iteratively refine the SDF through a differentiable mesh extraction method, continuously updating both the vertex positions and their connectivity based on the loss from mesh differentiable rasterization, while also optimizing the appearance representation. To further leverage high-fidelity and detail-rich representations from NeRF, we propose an online-learning strategy based on Upper Confidence Bound (UCB) to enhance viewpoints by adaptively incorporating images rendered by the initial NeRF model into the training dataset. Through extensive experiments, we demonstrate that our method delivers highly competitive and robust performance in both mesh rendering quality and geometric quality.
