Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D
Yan Luo, Advaith Ravishankar, Serena Liu, Yutong Yang, Mengyu Wang
TL;DR
The paper addresses the challenge of obtaining reliable 3D anatomical representations from inexpensive 2D medical scans by benchmarking zero-shot single-slice image-to-3D reconstruction across SAM3D and four diffusion-based baselines on six medical and two natural datasets. It introduces a zero-shot pipeline that converts masked 2D slices into 3D point clouds and evaluates them with voxel and distance metrics after ICP alignment to ground-truth. Results show that voxel-based metrics are generally weak due to depth-ambiguity from single views, but SAM3D provides the strongest global geometric similarity (lower Chamfer Distance and Earth Mover's Distance) among rivals, highlighting a transfer of some geometric priors but also the need for multi-view strategies. The work underscores the ill-posed nature of single-slice medical 3D reconstruction and motivates multi-view aggregation to enable reliable clinical 3D inference from common 2D imaging modalities.
Abstract
A 3D understanding of anatomy is central to diagnosis and treatment planning, yet volumetric imaging remains costly with long wait times. Image-to-3D foundations models can solve this issue by reconstructing 3D data from 2D modalites. Current foundation models are trained on natural image distributions to reconstruct naturalistic objects from a single image by leveraging geometric priors across pixels. However, it is unclear whether these learned geometric priors transfer to medical data. In this study, we present a controlled zero-shot benchmark of single slice medical image-to-3D reconstruction across five state-of-the-art image-to-3D models: SAM3D, Hunyuan3D-2.1, Direct3D, Hi3DGen, and TripoSG. These are evaluated across six medical datasets spanning anatomical and pathological structures and two natrual datasets, using voxel based metrics and point cloud distance metrics. Across medical datasets, voxel based overlap remains moderate for all models, consistent with a depth reconstruction failure mode when inferring volume from a single slice. In contrast, global distance metrics show more separation between methods: SAM3D achieves the strongest overall topological similarity to ground truth medical 3D data, while alternative models are more prone to over-simplication of reconstruction. Our results quantify the limits of single-slice medical reconstruction and highlight depth ambiguity caused by the planar nature of 2D medical data, motivating multi-view image-to-3D reconstruction to enable reliable medical 3D inference.
