Table of Contents
Fetching ...

InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction

Xulong Wang, Siyan Dong, Youyi Zheng, Yanchao Yang

TL;DR

InfoNorm introduces mutual information shaping of surface normals to regularize geometry in SDF-based NeRFs for sparse-view indoor reconstruction. By identifying geometrically correlated regions via multimodal semantic and monocular geometric features and enforcing an InfoNCE-style loss on normals, the method provides a robust, plug-in improvement to multiple baselines. It demonstrates consistent gains on ScanNet++ and Replica, with ablations validating the importance of feature fusion and the normal-based MI formulation. The approach offers a practical route to enhance 3D geometry without heavily altering model architectures, at the cost of some training-time overhead that scales with the underlying network. Overall, InfoNorm improves surface quality and sharpness in challenging sparse-view scenarios while maintaining compatibility with diverse NeRF/SDF pipelines.

Abstract

3D surface reconstruction from multi-view images is essential for scene understanding and interaction. However, complex indoor scenes pose challenges such as ambiguity due to limited observations. Recent implicit surface representations, such as Neural Radiance Fields (NeRFs) and signed distance functions (SDFs), employ various geometric priors to resolve the lack of observed information. Nevertheless, their performance heavily depends on the quality of the pre-trained geometry estimation models. To ease such dependence, we propose regularizing the geometric modeling by explicitly encouraging the mutual information among surface normals of highly correlated scene points. In this way, the geometry learning process is modulated by the second-order correlations from noisy (first-order) geometric priors, thus eliminating the bias due to poor generalization. Additionally, we introduce a simple yet effective scheme that utilizes semantic and geometric features to identify correlated points, enhancing their mutual information accordingly. The proposed technique can serve as a plugin for SDF-based neural surface representations. Our experiments demonstrate the effectiveness of the proposed in improving the surface reconstruction quality of major states of the arts. Our code is available at: \url{https://github.com/Muliphein/InfoNorm}.

InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction

TL;DR

InfoNorm introduces mutual information shaping of surface normals to regularize geometry in SDF-based NeRFs for sparse-view indoor reconstruction. By identifying geometrically correlated regions via multimodal semantic and monocular geometric features and enforcing an InfoNCE-style loss on normals, the method provides a robust, plug-in improvement to multiple baselines. It demonstrates consistent gains on ScanNet++ and Replica, with ablations validating the importance of feature fusion and the normal-based MI formulation. The approach offers a practical route to enhance 3D geometry without heavily altering model architectures, at the cost of some training-time overhead that scales with the underlying network. Overall, InfoNorm improves surface quality and sharpness in challenging sparse-view scenarios while maintaining compatibility with diverse NeRF/SDF pipelines.

Abstract

3D surface reconstruction from multi-view images is essential for scene understanding and interaction. However, complex indoor scenes pose challenges such as ambiguity due to limited observations. Recent implicit surface representations, such as Neural Radiance Fields (NeRFs) and signed distance functions (SDFs), employ various geometric priors to resolve the lack of observed information. Nevertheless, their performance heavily depends on the quality of the pre-trained geometry estimation models. To ease such dependence, we propose regularizing the geometric modeling by explicitly encouraging the mutual information among surface normals of highly correlated scene points. In this way, the geometry learning process is modulated by the second-order correlations from noisy (first-order) geometric priors, thus eliminating the bias due to poor generalization. Additionally, we introduce a simple yet effective scheme that utilizes semantic and geometric features to identify correlated points, enhancing their mutual information accordingly. The proposed technique can serve as a plugin for SDF-based neural surface representations. Our experiments demonstrate the effectiveness of the proposed in improving the surface reconstruction quality of major states of the arts. Our code is available at: \url{https://github.com/Muliphein/InfoNorm}.
Paper Structure (34 sections, 32 equations, 6 figures, 10 tables)

This paper contains 34 sections, 32 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: 3D scene reconstruction from sparse views on Replica straub2019replica (first row) and ScanNet++ yeshwanthliu2023scannetpp (second row). By enforcing the mutual information between the normals of highly correlated scene points, the proposed method can effectively enhance the reconstruction quality of the baselines (VolSDF yariv2021volume and GeoNeuS fu2022geo).
  • Figure 2: An overview of the pipeline, where we apply mutual information shaping on the geometric branch to enforce consistencies that help enhance the surface reconstruction. Specifically, the model consists of the NeRF backbone, an SDF head, and a color head. The predicted density and color are supervised by the classic eikonal loss $L_E$ and the photometric reconstruction loss $L_C$, respectively. We encode geometry-aware mutual information into a subset of the parameters $\theta^D$ of $f(x)$ to constrain the density field learning for better surface quality, which is achieved by the proposed mutual information loss $L_M$ computed on top of the estimated surface normal.
  • Figure 3: An example of positive samples from different features. Given an anchor pixel (marked by a blue circle in the first image) on the wall-like cabinet, semantic features like DINO often correlate both the cabinet and the ceiling (as shown by the red area in the second image). Meanwhile, geometric features such as (noisy) monocular normals can not distinguish between parallel planes that are not connected (the third image). By combining semantic and geometric features, we can obtain positive samples of the anchor pixel with better geometric consistency, thus, high mutual correlation.
  • Figure 4: Visual results on the ScanNet++ dataset. Each row shows a comparison with a different baseline, from top to bottom, NeuS wang2021neus$^+$, VolSDF yariv2021volume$^+$, GeoNeuS fu2022geo$^+$, I$^2$-SDF zhu2023i2sdf$^+$, NeuRIS wang2022neuris$^+$, MonoSDF yu2022monosdf$^+$, and Neuralangelo li2023neuralangelo$^+$. Red boxes are overlaid to help the contrast.
  • Figure 5: Reconstruction results after 10K iterations in the training process with (NeuS$^+$) and without (NeuS) the proposed geometric shaping technique.
  • ...and 1 more figures