Semantic Scene Completion with Multi-Feature Data Balancing Network

Mona Alawadh; Mahesan Niranjan; Hansung Kim

Semantic Scene Completion with Multi-Feature Data Balancing Network

Mona Alawadh, Mahesan Niranjan, Hansung Kim

TL;DR

MDBNet tackles Semantic Scene Completion from a single RGB-D input by fusing 2D RGB semantics with 3D geometry through a dual-head architecture. The 2D branch uses a Segformer encoder for RGB features, projected into 3D and fused with F-TSDF in a 3D CNN branch that employs Identity Transformed within full pre-activation Residual Modules (ITRM) and a Tanh activation on identity paths. A combined, reweighted loss balances 2D and 3D supervision, leveraging K-means-derived voxel weights to address intra-class diversity and inter-class ambiguity, with uncertainty quantified via k-fold cross-validation. MDBNet achieves state-of-the-art mIoU on NYUv2 and NYUCAD, demonstrating improved occlusion handling and robustness across fusion strategies, and offering a competitive balance between accuracy and computational considerations compared with heavier priors like SPAwN.

Abstract

Semantic Scene Completion (SSC) is a critical task in computer vision, that utilized in applications such as virtual reality (VR). SSC aims to construct detailed 3D models from partial views by transforming a single 2D image into a 3D representation, assigning each voxel a semantic label. The main challenge lies in completing 3D volumes with limited information, compounded by data imbalance, inter-class ambiguity, and intra-class diversity in indoor scenes. To address this, we propose the Multi-Feature Data Balancing Network (MDBNet), a dual-head model for RGB and depth data (F-TSDF) inputs. Our hybrid encoder-decoder architecture with identity transformation in a pre-activation residual module (ITRM) effectively manages diverse signals within F-TSDF. We evaluate RGB feature fusion strategies and use a combined loss function cross entropy for 2D RGB features and weighted cross-entropy for 3D SSC predictions. MDBNet results surpass comparable state-of-the-art (SOTA) methods on NYU datasets, demonstrating the effectiveness of our approach.

Semantic Scene Completion with Multi-Feature Data Balancing Network

TL;DR

Abstract

Semantic Scene Completion with Multi-Feature Data Balancing Network

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)