ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai, Varun Jampani, James M. Rehg
TL;DR
ZeroShape tackles single-image zero-shot 3D shape reconstruction with a regression-based approach that regresses a view-centric occupancy field. It introduces a geometric processing unit consisting of depth/intrinsics estimation, a differentiable unprojection to a projection map, and a projection-guided cross-attention reconstructor, all trained with a two-stage loss culminating in 3D occupancy supervision. A large, standardized benchmark—built from ShapeNet, Objaverse, OmniObject3D, Ocrtoc3D, and Pix3D—enables robust evaluation of zero-shot generalization, showing ZeroShape achieves state-of-the-art results while using significantly less data and compute than prior generative methods. This work shifts the paradigm toward efficient, regression-based zero-shot 3D reconstruction and provides a valuable, scalable evaluation resource for the community.
Abstract
We study the problem of single-image zero-shot 3D shape reconstruction. Recent works learn zero-shot shape reconstruction through generative modeling of 3D assets, but these models are computationally expensive at train and inference time. In contrast, the traditional approach to this problem is regression-based, where deterministic models are trained to directly regress the object shape. Such regression methods possess much higher computational efficiency than generative methods. This raises a natural question: is generative modeling necessary for high performance, or conversely, are regression-based approaches still competitive? To answer this, we design a strong regression-based model, called ZeroShape, based on the converging findings in this field and a novel insight. We also curate a large real-world evaluation benchmark, with objects from three different real-world 3D datasets. This evaluation benchmark is more diverse and an order of magnitude larger than what prior works use to quantitatively evaluate their models, aiming at reducing the evaluation variance in our field. We show that ZeroShape not only achieves superior performance over state-of-the-art methods, but also demonstrates significantly higher computational and data efficiency.
