Table of Contents
Fetching ...

FAWN: Floor-And-Walls Normal Regularization for Direct Neural TSDF Reconstruction

Anna Sokolova, Anna Vorontsova, Bulat Gabdullin, Alexander Limonov

TL;DR

This work addresses the lack of explicit global indoor-geometry constraints in direct TSDF reconstruction. It introduces FAWN, a floor-and-walls normal regularization that uses a trainable 3D semantic head to identify walls and floors and imposes vertical/horizontal orientation on their normals during training, via a composite loss including $ ext{L}_{FAWN}$. The approach improves reconstruction quality and completeness across multiple baselines and benchmarks (e.g., ScanNet, TUM RGB-D, ICL-NUIM, 7Scenes), without adding inference-time overhead or requiring semantics at test-time. By integrating semantic guidance with normal-based regularization, FAWN enhances planar region fidelity and hole filling, enabling more accurate indoor scene reconstructions in practical settings.

Abstract

Leveraging 3D semantics for direct 3D reconstruction has a great potential yet unleashed. For instance, by assuming that walls are vertical, and a floor is planar and horizontal, we can correct distorted room shapes and eliminate local artifacts such as holes, pits, and hills. In this paper, we propose FAWN, a modification of truncated signed distance function (TSDF) reconstruction methods, which considers scene structure by detecting walls and floor in a scene, and penalizing the corresponding surface normals for deviating from the horizontal and vertical directions. Implemented as a 3D sparse convolutional module, FAWN can be incorporated into any trainable pipeline that predicts TSDF. Since FAWN requires 3D semantics only for training, no additional limitations on further use are imposed. We demonstrate, that FAWN-modified methods use semantics more effectively, than existing semantic-based approaches. Besides, we apply our modification to state-of-the-art TSDF reconstruction methods, and demonstrate a quality gain in SCANNET, ICL-NUIM, TUM RGB-D, and 7SCENES benchmarks.

FAWN: Floor-And-Walls Normal Regularization for Direct Neural TSDF Reconstruction

TL;DR

This work addresses the lack of explicit global indoor-geometry constraints in direct TSDF reconstruction. It introduces FAWN, a floor-and-walls normal regularization that uses a trainable 3D semantic head to identify walls and floors and imposes vertical/horizontal orientation on their normals during training, via a composite loss including . The approach improves reconstruction quality and completeness across multiple baselines and benchmarks (e.g., ScanNet, TUM RGB-D, ICL-NUIM, 7Scenes), without adding inference-time overhead or requiring semantics at test-time. By integrating semantic guidance with normal-based regularization, FAWN enhances planar region fidelity and hole filling, enabling more accurate indoor scene reconstructions in practical settings.

Abstract

Leveraging 3D semantics for direct 3D reconstruction has a great potential yet unleashed. For instance, by assuming that walls are vertical, and a floor is planar and horizontal, we can correct distorted room shapes and eliminate local artifacts such as holes, pits, and hills. In this paper, we propose FAWN, a modification of truncated signed distance function (TSDF) reconstruction methods, which considers scene structure by detecting walls and floor in a scene, and penalizing the corresponding surface normals for deviating from the horizontal and vertical directions. Implemented as a 3D sparse convolutional module, FAWN can be incorporated into any trainable pipeline that predicts TSDF. Since FAWN requires 3D semantics only for training, no additional limitations on further use are imposed. We demonstrate, that FAWN-modified methods use semantics more effectively, than existing semantic-based approaches. Besides, we apply our modification to state-of-the-art TSDF reconstruction methods, and demonstrate a quality gain in SCANNET, ICL-NUIM, TUM RGB-D, and 7SCENES benchmarks.
Paper Structure (19 sections, 4 equations, 5 figures, 3 tables)

This paper contains 19 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: 3D reconstructions by state-of-the-art VoRTX stier2021vortx and VoRTX+FAWN. FAWN modification improves reconstruction of planar regions and fills the holes even in the areas not covered by ground truth.
  • Figure 2: We verify FAWN assumption on ScanNet. Due to imperfect ground truth scans, floor and walls normals deviate from the vertical and horizontal direction, yet the median deviation should be minor. In fact, the median angle between floor normals and the vertical direction is within $5.1\degree$, while the median angle between walls normals and the horizontal direction is within $6.3\degree$ for 90% of scenes.
  • Figure 3: Training procedure of a FAWN-modified method, with baseline components of the pipeline and FAWN add-ons specified. A set of RGB images with camera poses are first processed with a backbone. The extracted features are passed to a baseline TSDF head and FAWN auxiliary semantic head, that detects walls and floor. The semantic head is guided with a semantic loss${\cal L}_{sem}$ during the training, while the TSDF head is trained to minimize baseline-specific TSDF losses${\cal L}_{TSDF}$. Surface normals are derived from the predicted TSDF as first-order gradients, and are used to regularize geometry via normal losses ${\cal L}_{norm}$. Our key novelty is in combining normals and semantics in ${\cal L}_{FAWN}$: surface normals in walls regions are penalized for deviation from the horizontal direction, while the constraints imposed on floor normals force them to be vertical. As a result, the floor and walls in a reconstructed scene become more smooth and planar, and holes get filled with planar segments.
  • Figure 4: 3D reconstructions of a ScanNet scene obtained with baseline methods and their FAWN-modified versions.
  • Figure 5: 3D scans of a 7Scenes scene reconstructed with and without FAWN. We report coverage to provide visual intuition: evidently, the higher the coverage score, the fewer and smaller are gaps in the reconstructed scans. The lowest score of 29.0 is obtained with the original NeuralRecon, that reconstructs only a small part of a scan. Atlas yields scans of a full coverage, but too coarse and over-smoothed, which is reflected in values of other metrics. VisFusion has a coverage of 56.8, and the scan is obviously incomplete. VoRTX provides an almost full coverage with the best reconstruction quality.