3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction
Christopher B. Choy, Danfei Xu, JunYoung Gwak, Kevin Chen, Silvio Savarese
TL;DR
<3-5 sentence high-level summary> The paper addresses the challenge of reconstructing 3D objects from limited image views, potentially from uncalibrated viewpoints, by learning an end-to-end mapping from images to 3D shapes. It introduces 3D-R2N2, a unified encoder–3D-LSTM–decoder architecture that incrementally refines a voxel-based reconstruction as more views become available, trained with minimal supervision on synthetic data. Key contributions include a 3D Convolutional LSTM with local connectivity, a 3D deconvolutional decoder, and demonstrations of single-view and multi-view reconstruction that outperform state-of-the-art single-view methods and remain robust when traditional SFM/SLAM fail. The approach generalizes to real-world images and shows competitive or superior performance against MVS under sparse or textureless conditions, highlighting its practical impact for rapid 3D prototyping and recognition in varied conditions.
Abstract
Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework i) outperforms the state-of-the-art methods for single view reconstruction, and ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
