Table of Contents
Fetching ...

Generating 3D House Wireframes with Semantics

Xueqi Ma, Yilin Liu, Wenjun Zhou, Ruowei Wang, Hui Huang

TL;DR

This work tackles unconditional generation of semantically enriched 3D house wireframes. It introduces a wire-based representation and semantic sequencing, coupled with a two-stage pipeline: a graph-based autoencoder learns a quantized geometric vocabulary and a transformer decoder autoregressively generates semantically ordered wire segments via BFS grouping. Key innovations include Local Multi-Head Attention in the encoder, Residual LFQ quantization, and a coarse-to-fine transformer to produce coherent line and vertex embeddings. Experimental results on a newly created 3D house wireframe dataset show superior accuracy, novelty, and semantic fidelity compared to baselines, with qualitative analyses and user studies supporting practical usefulness for CAD and VR applications.

Abstract

We present a new approach for generating 3D house wireframes with semantic enrichment using an autoregressive model. Unlike conventional generative models that independently process vertices, edges, and faces, our approach employs a unified wire-based representation for improved coherence in learning 3D wireframe structures. By re-ordering wire sequences based on semantic meanings, we facilitate seamless semantic integration during sequence generation. Our two-phase technique merges a graph-based autoencoder with a transformer-based decoder to learn latent geometric tokens and generate semantic-aware wireframes. Through iterative prediction and decoding during inference, our model produces detailed wireframes that can be easily segmented into distinct components, such as walls, roofs, and rooms, reflecting the semantic essence of the shape. Empirical results on a comprehensive house dataset validate the superior accuracy, novelty, and semantic fidelity of our model compared to existing generative models. More results and details can be found on https://vcc.tech/research/2024/3DWire.

Generating 3D House Wireframes with Semantics

TL;DR

This work tackles unconditional generation of semantically enriched 3D house wireframes. It introduces a wire-based representation and semantic sequencing, coupled with a two-stage pipeline: a graph-based autoencoder learns a quantized geometric vocabulary and a transformer decoder autoregressively generates semantically ordered wire segments via BFS grouping. Key innovations include Local Multi-Head Attention in the encoder, Residual LFQ quantization, and a coarse-to-fine transformer to produce coherent line and vertex embeddings. Experimental results on a newly created 3D house wireframe dataset show superior accuracy, novelty, and semantic fidelity compared to baselines, with qualitative analyses and user studies supporting practical usefulness for CAD and VR applications.

Abstract

We present a new approach for generating 3D house wireframes with semantic enrichment using an autoregressive model. Unlike conventional generative models that independently process vertices, edges, and faces, our approach employs a unified wire-based representation for improved coherence in learning 3D wireframe structures. By re-ordering wire sequences based on semantic meanings, we facilitate seamless semantic integration during sequence generation. Our two-phase technique merges a graph-based autoencoder with a transformer-based decoder to learn latent geometric tokens and generate semantic-aware wireframes. Through iterative prediction and decoding during inference, our model produces detailed wireframes that can be easily segmented into distinct components, such as walls, roofs, and rooms, reflecting the semantic essence of the shape. Empirical results on a comprehensive house dataset validate the superior accuracy, novelty, and semantic fidelity of our model compared to existing generative models. More results and details can be found on https://vcc.tech/research/2024/3DWire.
Paper Structure (18 sections, 2 equations, 10 figures, 4 tables)

This paper contains 18 sections, 2 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: Our method creates 3D wireframes through autoregressive sampling from a trained transformer model, generating tokens from a learned geometric vocabulary. These tokens are decoded into line segments to form the final wireframe. Based on the nodes' connectivity, the resulting wireframes can be easily split into multiple parts, such as walls, roofs, and rooms, reflecting the underlying semantic meaning of shapes.
  • Figure 2: Pipeline of learning the geometric vocabulary of line segments.
  • Figure 3: Pipeline of the transformer training. Firstly, the wireframe is encoded by the encoder to extract features and undergo Residual LFQ yu2023language to obtain code indices of each line segment. These indices are then split and transformed into code embeddings for vertices ($C_v$) and are recovered to the code embeddings of line segments ($C_l$). The line embeddings are progressively predicted (Pred line emb) by the coarse transformer and refined to the vertex embedding (Pred vtx embs) by the fine transformer. They are finally transformed back to codebook indices through a mapper, and these indices are then optimized using a loss function to generate high-quality wireframes.
  • Figure 4: Qualitative comparison of 3D house wireframes. Compared to baselines, our method produces valid wireframes with high geometric fidelity and greater simplicity.
  • Figure 5: The resulting wireframe can be easily converted into a mesh model.
  • ...and 5 more figures