Table of Contents
Fetching ...

NEAT: Distilling 3D Wireframes from Neural Attraction Fields

Nan Xue, Bin Tan, Yuxi Xiao, Liang Dong, Gui-Song Xia, Tianfu Wu, Yujun Shen

TL;DR

NEAT tackles multi-view 3D wireframe reconstruction by introducing rendering-distilling neural fields that jointly model 3D line segments and global 3D junctions without explicit cross-view feature matching. The approach combines Neural Attraction Fields to render 3D line segments from 2D wireframes with a Global 3D Junction Perceiver that distills a sparse junction set, followed by a distillation step that aligns lines to junctions and refines the result with SDF-based optimization. Quantitative results on DTU and BlendedMVS show that NEAT yields superior accuracy and completeness over matching-based methods, while qualitative results demonstrate coherent, compact wireframes even in curve-dominated scenes; the distilled junctions also provide a strong initialization for 3D Gaussian Splatting, enabling high-quality rendering with dramatically fewer initial points. Overall, NEAT presents a first-of-its-kind, end-to-end, matching-free framework for 3D wireframe parsing that leverages neural implicit representations to produce compact, editable geometric primitives with practical impact for subsequent neural rendering tasks.

Abstract

This paper studies the problem of structured 3D reconstruction using wireframes that consist of line segments and junctions, focusing on the computation of structured boundary geometries of scenes. Instead of leveraging matching-based solutions from 2D wireframes (or line segments) for 3D wireframe reconstruction as done in prior arts, we present NEAT, a rendering-distilling formulation using neural fields to represent 3D line segments with 2D observations, and bipartite matching for perceiving and distilling of a sparse set of 3D global junctions. The proposed {NEAT} enjoys the joint optimization of the neural fields and the global junctions from scratch, using view-dependent 2D observations without precomputed cross-view feature matching. Comprehensive experiments on the DTU and BlendedMVS datasets demonstrate our NEAT's superiority over state-of-the-art alternatives for 3D wireframe reconstruction. Moreover, the distilled 3D global junctions by NEAT, are a better initialization than SfM points, for the recently-emerged 3D Gaussian Splatting for high-fidelity novel view synthesis using about 20 times fewer initial 3D points. Project page: \url{https://xuenan.net/neat}.

NEAT: Distilling 3D Wireframes from Neural Attraction Fields

TL;DR

NEAT tackles multi-view 3D wireframe reconstruction by introducing rendering-distilling neural fields that jointly model 3D line segments and global 3D junctions without explicit cross-view feature matching. The approach combines Neural Attraction Fields to render 3D line segments from 2D wireframes with a Global 3D Junction Perceiver that distills a sparse junction set, followed by a distillation step that aligns lines to junctions and refines the result with SDF-based optimization. Quantitative results on DTU and BlendedMVS show that NEAT yields superior accuracy and completeness over matching-based methods, while qualitative results demonstrate coherent, compact wireframes even in curve-dominated scenes; the distilled junctions also provide a strong initialization for 3D Gaussian Splatting, enabling high-quality rendering with dramatically fewer initial points. Overall, NEAT presents a first-of-its-kind, end-to-end, matching-free framework for 3D wireframe parsing that leverages neural implicit representations to produce compact, editable geometric primitives with practical impact for subsequent neural rendering tasks.

Abstract

This paper studies the problem of structured 3D reconstruction using wireframes that consist of line segments and junctions, focusing on the computation of structured boundary geometries of scenes. Instead of leveraging matching-based solutions from 2D wireframes (or line segments) for 3D wireframe reconstruction as done in prior arts, we present NEAT, a rendering-distilling formulation using neural fields to represent 3D line segments with 2D observations, and bipartite matching for perceiving and distilling of a sparse set of 3D global junctions. The proposed {NEAT} enjoys the joint optimization of the neural fields and the global junctions from scratch, using view-dependent 2D observations without precomputed cross-view feature matching. Comprehensive experiments on the DTU and BlendedMVS datasets demonstrate our NEAT's superiority over state-of-the-art alternatives for 3D wireframe reconstruction. Moreover, the distilled 3D global junctions by NEAT, are a better initialization than SfM points, for the recently-emerged 3D Gaussian Splatting for high-fidelity novel view synthesis using about 20 times fewer initial 3D points. Project page: \url{https://xuenan.net/neat}.
Paper Structure (35 sections, 14 equations, 12 figures, 8 tables)

This paper contains 35 sections, 14 equations, 12 figures, 8 tables.

Figures (12)

  • Figure 1: Illustrative Overview of the problem of 3D wireframe reconstruction. Given a set of posed images and the corresponding 2D wireframe detection results in \ref{['fig:input']}, the proposed NEAT estimates the 3D wireframe representation of the scene in \ref{['fig:output']}.
  • Figure 2: The proposed NEAT field learning framework for 3D wireframe reconstruction. In the top, the neural design of NEAT MLP and the predefined $N$ global junctions are illustrated, these two components are "attracted" by the junction-to-line bipartite matching, resulting a rendering-yet-distillation formulation to render 3D line segments in NEAT MLP as a dense representation of 3D line segments, and then distilled by the learned 3D global junctions for wireframe reconstruction.
  • Figure 3: Two cases of learned noisy and redundant 3D line segments by line segment rendering. The case (a) takes the images and line segments introduced in \ref{['fig:input']}, and the case (b) is a real-world case of DTU-24 scene.
  • Figure 4: Optimization Process of 3D Junction Perceiving (top) from the noisy 3D line cloud (bottom) on the DTU-23 scene.
  • Figure 5: Visualization of 3D Wireframe Reconstruction on the 12 scenes from the DTU dataset DTU-AanaesJVTD16 and the 4 scenes from the BlendedMVS dataset BMVS-dataset. For each scene, we show its line segment view (by hiding the junctions) in black, and the wireframe view by coloring the junctions in blue. For the comparison, please see our https://youtu.be/qtBQYbOpVpc.
  • ...and 7 more figures