Table of Contents
Fetching ...

Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction

Mingyang Li, Yimeng Fan, Changsong Liu, Lixue Xu, Xin Wang, Yanyan Liu, Wei Zhang

TL;DR

IPDRecon introduces an image-plane decoding framework that exploits intra-view geometric priors to reduce reliance on multi-view back-projections for indoor scene reconstruction. By integrating Pixel-level Confidence Encoder, Affine Compensation Module, and Image-Plane Spatial Decoder, the method decodes distance, position, and affine-invariant geometric features from single views and fuses them with multi-view constraints via a state-space, geometry-aware cost volume. In experiments on ScanNet V2 and cross-domain tests, IPDRecon achieves superior stability under sparse views (CV = 0.24%, PRR = 99.7%, Max Drop = 0.42%) and high 3D reconstruction metrics (Precision 0.797, F-score 0.722). The work demonstrates that leveraging intra-view geometric information can substantially improve view-invariant indoor reconstruction, enabling robust performance in view-limited practical scenarios.

Abstract

Volume-based indoor scene reconstruction methods offer superior generalization capability and real-time deployment potential. However, existing methods rely on multi-view pixel back-projection ray intersections as weak geometric constraints to determine spatial positions. This dependence results in reconstruction quality being heavily influenced by input view density. Performance degrades in overlapping regions and unobserved areas.To address these limitations, we reduce dependency on inter-view geometric constraints by exploiting spatial information within individual views. We propose an image-plane decoding framework with three core components: Pixel-level Confidence Encoder, Affine Compensation Module, and Image-Plane Spatial Decoder. These modules decode three-dimensional structural information encoded in images through physical imaging processes. The framework effectively preserves spatial geometric features including edges, hollow structures, and complex textures. It significantly enhances view-invariant reconstruction.Experiments on indoor scene reconstruction datasets confirm superior reconstruction stability. Our method maintains nearly identical quality when view count reduces by 40%. It achieves a coefficient of variation of 0.24%, performance retention rate of 99.7%, and maximum performance drop of 0.42%. These results demonstrate that exploiting intra-view spatial information provides a robust solution for view-limited scenarios in practical applications.

Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction

TL;DR

IPDRecon introduces an image-plane decoding framework that exploits intra-view geometric priors to reduce reliance on multi-view back-projections for indoor scene reconstruction. By integrating Pixel-level Confidence Encoder, Affine Compensation Module, and Image-Plane Spatial Decoder, the method decodes distance, position, and affine-invariant geometric features from single views and fuses them with multi-view constraints via a state-space, geometry-aware cost volume. In experiments on ScanNet V2 and cross-domain tests, IPDRecon achieves superior stability under sparse views (CV = 0.24%, PRR = 99.7%, Max Drop = 0.42%) and high 3D reconstruction metrics (Precision 0.797, F-score 0.722). The work demonstrates that leveraging intra-view geometric information can substantially improve view-invariant indoor reconstruction, enabling robust performance in view-limited practical scenarios.

Abstract

Volume-based indoor scene reconstruction methods offer superior generalization capability and real-time deployment potential. However, existing methods rely on multi-view pixel back-projection ray intersections as weak geometric constraints to determine spatial positions. This dependence results in reconstruction quality being heavily influenced by input view density. Performance degrades in overlapping regions and unobserved areas.To address these limitations, we reduce dependency on inter-view geometric constraints by exploiting spatial information within individual views. We propose an image-plane decoding framework with three core components: Pixel-level Confidence Encoder, Affine Compensation Module, and Image-Plane Spatial Decoder. These modules decode three-dimensional structural information encoded in images through physical imaging processes. The framework effectively preserves spatial geometric features including edges, hollow structures, and complex textures. It significantly enhances view-invariant reconstruction.Experiments on indoor scene reconstruction datasets confirm superior reconstruction stability. Our method maintains nearly identical quality when view count reduces by 40%. It achieves a coefficient of variation of 0.24%, performance retention rate of 99.7%, and maximum performance drop of 0.42%. These results demonstrate that exploiting intra-view spatial information provides a robust solution for view-limited scenarios in practical applications.

Paper Structure

This paper contains 20 sections, 12 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Visualization of Reconstruction Quality. Previous methods suffer from severe artifacts and incomplete structures under sparse views. By leveraging single-view geometric priors through PCE and ACM modules, IPDRecon achieves superior reconstruction quality with better structural detail preservation.
  • Figure 2: Architecture of IPDRecon. Given a series of posed images, we use a 2D backbone network to generate coarse-to-fine 2D features $F_{i}^{r}$. Subsequently, through the IPD-Projection stage and the multi-view fusion stage, we obtain the feature volume $V^{r}$. Finally, we use a 3D backbone network to regress the scene surface.
  • Figure 3: According to Lambert's law, the interaction of reflected light within a scene contributes to imaging. Consequently, the information of a single pixel is formed by the linear superposition of multiple light rays from the spatial domain.
  • Figure 4: Demonstration of Pixel-level Confidence Encoder (PCE) calculation process.
  • Figure 5: Visual representation of four affine transformations. Rotation and reflection alter the direction of light propagation while preserving path length and angular relationships between rays. Shearing transformation modifies the angular relationships between light rays while maintaining propagation direction and path length. Translation preserves all three properties: direction of propagation, path length, and angular relationships between light rays.
  • ...and 3 more figures