Table of Contents
Fetching ...

MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes

Luoxi Zhang, Chun Xie, Itaru Kitahara

TL;DR

This work addresses single-view 3D reconstruction in complex real-world scenes by fusing RGB information with category-aware geometric priors and decoding with a Kolmogorov–Arnold Network (KAN) based hybrid decoder. The method, MGP-KAD, constructs adaptable geometric priors through a two-stage prototype learning pipeline and dynamically fuses them with image features via attention, while the KAN-based decoder enables high-capacity, multi-scale, nonlinear surface reconstruction. Differentiable rendering during training further refines geometry and appearance, with final surfaces extracted by Marching Cubes. On Pix3D, MGP-KAD achieves state-of-the-art performance, notably reducing Chamfer Distance by 9.86%, increasing F-score by 6.03%, and improving Normal Consistency by 12.2%, demonstrating robust reconstruction of fine geometric details in complex scenes.

Abstract

Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.

MGP-KAD: Multimodal Geometric Priors and Kolmogorov-Arnold Decoder for Single-View 3D Reconstruction in Complex Scenes

TL;DR

This work addresses single-view 3D reconstruction in complex real-world scenes by fusing RGB information with category-aware geometric priors and decoding with a Kolmogorov–Arnold Network (KAN) based hybrid decoder. The method, MGP-KAD, constructs adaptable geometric priors through a two-stage prototype learning pipeline and dynamically fuses them with image features via attention, while the KAN-based decoder enables high-capacity, multi-scale, nonlinear surface reconstruction. Differentiable rendering during training further refines geometry and appearance, with final surfaces extracted by Marching Cubes. On Pix3D, MGP-KAD achieves state-of-the-art performance, notably reducing Chamfer Distance by 9.86%, increasing F-score by 6.03%, and improving Normal Consistency by 12.2%, demonstrating robust reconstruction of fine geometric details in complex scenes.

Abstract

Single-view 3D reconstruction in complex real-world scenes is challenging due to noise, object diversity, and limited dataset availability. To address these challenges, we propose MGP-KAD, a novel multimodal feature fusion framework that integrates RGB and geometric prior to enhance reconstruction accuracy. The geometric prior is generated by sampling and clustering ground-truth object data, producing class-level features that dynamically adjust during training to improve geometric understanding. Additionally, we introduce a hybrid decoder based on Kolmogorov-Arnold Networks (KAN) to overcome the limitations of traditional linear decoders in processing complex multimodal inputs. Extensive experiments on the Pix3D dataset demonstrate that MGP-KAD achieves state-of-the-art (SOTA) performance, significantly improving geometric integrity, smoothness, and detail preservation. Our work provides a robust and effective solution for advancing single-view 3D reconstruction in complex scenes.
Paper Structure (21 sections, 4 equations, 3 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 4 equations, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: Three-stage reconstruction pipeline: (A) Offline prototype library construction via representative shape selection, (B) Online feature extraction and geometric prior retrieval, (C) Feature fusion and surface-optimized decoding.
  • Figure 2: 2D t-SNE Visualization with Cluster Separation
  • Figure 3: Qualitative results. Examples from Pix3D pix3d datasets.