Table of Contents
Fetching ...

Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D

Yan Luo, Advaith Ravishankar, Serena Liu, Yutong Yang, Mengyu Wang

TL;DR

The paper addresses the challenge of obtaining reliable 3D anatomical representations from inexpensive 2D medical scans by benchmarking zero-shot single-slice image-to-3D reconstruction across SAM3D and four diffusion-based baselines on six medical and two natural datasets. It introduces a zero-shot pipeline that converts masked 2D slices into 3D point clouds and evaluates them with voxel and distance metrics after ICP alignment to ground-truth. Results show that voxel-based metrics are generally weak due to depth-ambiguity from single views, but SAM3D provides the strongest global geometric similarity (lower Chamfer Distance and Earth Mover's Distance) among rivals, highlighting a transfer of some geometric priors but also the need for multi-view strategies. The work underscores the ill-posed nature of single-slice medical 3D reconstruction and motivates multi-view aggregation to enable reliable clinical 3D inference from common 2D imaging modalities.

Abstract

A 3D understanding of anatomy is central to diagnosis and treatment planning, yet volumetric imaging remains costly with long wait times. Image-to-3D foundations models can solve this issue by reconstructing 3D data from 2D modalites. Current foundation models are trained on natural image distributions to reconstruct naturalistic objects from a single image by leveraging geometric priors across pixels. However, it is unclear whether these learned geometric priors transfer to medical data. In this study, we present a controlled zero-shot benchmark of single slice medical image-to-3D reconstruction across five state-of-the-art image-to-3D models: SAM3D, Hunyuan3D-2.1, Direct3D, Hi3DGen, and TripoSG. These are evaluated across six medical datasets spanning anatomical and pathological structures and two natrual datasets, using voxel based metrics and point cloud distance metrics. Across medical datasets, voxel based overlap remains moderate for all models, consistent with a depth reconstruction failure mode when inferring volume from a single slice. In contrast, global distance metrics show more separation between methods: SAM3D achieves the strongest overall topological similarity to ground truth medical 3D data, while alternative models are more prone to over-simplication of reconstruction. Our results quantify the limits of single-slice medical reconstruction and highlight depth ambiguity caused by the planar nature of 2D medical data, motivating multi-view image-to-3D reconstruction to enable reliable medical 3D inference.

Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D

TL;DR

The paper addresses the challenge of obtaining reliable 3D anatomical representations from inexpensive 2D medical scans by benchmarking zero-shot single-slice image-to-3D reconstruction across SAM3D and four diffusion-based baselines on six medical and two natural datasets. It introduces a zero-shot pipeline that converts masked 2D slices into 3D point clouds and evaluates them with voxel and distance metrics after ICP alignment to ground-truth. Results show that voxel-based metrics are generally weak due to depth-ambiguity from single views, but SAM3D provides the strongest global geometric similarity (lower Chamfer Distance and Earth Mover's Distance) among rivals, highlighting a transfer of some geometric priors but also the need for multi-view strategies. The work underscores the ill-posed nature of single-slice medical 3D reconstruction and motivates multi-view aggregation to enable reliable clinical 3D inference from common 2D imaging modalities.

Abstract

A 3D understanding of anatomy is central to diagnosis and treatment planning, yet volumetric imaging remains costly with long wait times. Image-to-3D foundations models can solve this issue by reconstructing 3D data from 2D modalites. Current foundation models are trained on natural image distributions to reconstruct naturalistic objects from a single image by leveraging geometric priors across pixels. However, it is unclear whether these learned geometric priors transfer to medical data. In this study, we present a controlled zero-shot benchmark of single slice medical image-to-3D reconstruction across five state-of-the-art image-to-3D models: SAM3D, Hunyuan3D-2.1, Direct3D, Hi3DGen, and TripoSG. These are evaluated across six medical datasets spanning anatomical and pathological structures and two natrual datasets, using voxel based metrics and point cloud distance metrics. Across medical datasets, voxel based overlap remains moderate for all models, consistent with a depth reconstruction failure mode when inferring volume from a single slice. In contrast, global distance metrics show more separation between methods: SAM3D achieves the strongest overall topological similarity to ground truth medical 3D data, while alternative models are more prone to over-simplication of reconstruction. Our results quantify the limits of single-slice medical reconstruction and highlight depth ambiguity caused by the planar nature of 2D medical data, motivating multi-view image-to-3D reconstruction to enable reliable medical 3D inference.
Paper Structure (9 sections, 7 figures, 2 tables)

This paper contains 9 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Zero-shot single-slice medical image-to-3D pipeline. A NIfTI scan and NIfTI segmentation mask are preprocessed to generate a single masked 2D image and a ground truth 3D point cloud. The 2D masked input is passed through SAM3D to get a reconstructed 3D point cloud. We evaluate the reconstruction against the extracted 3D ground truth using the metrics in \ref{['metrics']}.
  • Figure 2: Voxel-based reconstruction quality metrics across medical datasets. F1 Score, Voxel IoU, and Voxel Dice values evaluated on coronal and axial slice reconstructions for five 3D generation models (Direct3D, Hi3DGen, Hunyuan3D-2.1, SAM3D, TripoSG) across six medical imaging datasets (Aeropath, BTCV, Duke Cspine, MSD_Brain, MSD_Liver, MSD_Lung). Solid bars represent coronal slice metrics; hatched bars represent axial slice metrics. Error bars indicate standard deviation.
  • Figure 3: Point cloud distance metrics across medical datasets. Chamfer Distance and Earth Mover's Distance (EMD) evaluated on coronal and axial slice reconstructions for five 3D generation models (Direct3D, Hi3DGen, Hunyuan3D-2.1, SAM3D, TripoSG) across six medical imaging datasets (Aeropath, BTCV, Duke Cspine, MSD_Brain, MSD_Liver, MSD_Lung). Solid bars represent coronal slice metrics; hatched bars represent axial slice metrics. Lower values indicate better reconstruction quality. Error bars indicate standard deviation.
  • Figure 4: Voxel and distance-based reconstruction quality metrics across natural datasets. F1 Score, Voxel IoU, Voxel Dice, Chamfer Distance, and EMD values evaluated for five 3D generation models (Direct3D, Hi3DGen, Hunyuan3D-2.1, SAM3D, TripoSG) across two natural datasets ( Google Scanned Objects and Animal3D ).
  • Figure 5: Coronal view qualitative reconstruction on AeroPath and BTCV from SAM3D
  • ...and 2 more figures