Table of Contents
Fetching ...

Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation

Mahtab Dahaghin, Myrna Castillo, Kourosh Riahidehkordi, Matteo Toso, Alessio Del Bue

TL;DR

This work tackles affordable 3D digitization of cultural heritage objects using only RGB imagery from consumer devices. It introduces a pipeline that combines 3D Gaussian Splatting with open-vocabulary segmentation via Grounding DINO and SAM, jointly learning geometry, appearance, and per-object segmentation through losses $\mathcal{L}_{rendering}$, $\lambda_{clustering}\mathcal{L}_{CC}$, and $\mathcal{L}_{reg}$, with per-Gaussian segmentation features in $\mathbb{R}^{16}$. The method enables instance-aware 3D reconstruction and automated object extraction, including a convex-hull refinement step ($gaussian\_grouping$) and multi-view rendering, and is released in a Dockerized pipeline. Evaluation on public datasets shows quantitative improvements in segmentation metrics (e.g., $mIoU$ and $mBIoU$) and qualitative demonstrations of accurate object-level 3D models, highlighting practical potential for museum-like deployments.

Abstract

The creation of digital replicas of physical objects has valuable applications for the preservation and dissemination of tangible cultural heritage. However, existing methods are often slow, expensive, and require expert knowledge. We propose a pipeline to generate a 3D replica of a scene using only RGB images (e.g. photos of a museum) and then extract a model for each item of interest (e.g. pieces in the exhibit). We do this by leveraging the advancements in novel view synthesis and Gaussian Splatting, modified to enable efficient 3D segmentation. This approach does not need manual annotation, and the visual inputs can be captured using a standard smartphone, making it both affordable and easy to deploy. We provide an overview of the method and baseline evaluation of the accuracy of object segmentation. The code is available at https://mahtaabdn.github.io/gaussian_heritage.github.io/.

Gaussian Heritage: 3D Digitization of Cultural Heritage with Integrated Object Segmentation

TL;DR

This work tackles affordable 3D digitization of cultural heritage objects using only RGB imagery from consumer devices. It introduces a pipeline that combines 3D Gaussian Splatting with open-vocabulary segmentation via Grounding DINO and SAM, jointly learning geometry, appearance, and per-object segmentation through losses , , and , with per-Gaussian segmentation features in . The method enables instance-aware 3D reconstruction and automated object extraction, including a convex-hull refinement step () and multi-view rendering, and is released in a Dockerized pipeline. Evaluation on public datasets shows quantitative improvements in segmentation metrics (e.g., and ) and qualitative demonstrations of accurate object-level 3D models, highlighting practical potential for museum-like deployments.

Abstract

The creation of digital replicas of physical objects has valuable applications for the preservation and dissemination of tangible cultural heritage. However, existing methods are often slow, expensive, and require expert knowledge. We propose a pipeline to generate a 3D replica of a scene using only RGB images (e.g. photos of a museum) and then extract a model for each item of interest (e.g. pieces in the exhibit). We do this by leveraging the advancements in novel view synthesis and Gaussian Splatting, modified to enable efficient 3D segmentation. This approach does not need manual annotation, and the visual inputs can be captured using a standard smartphone, making it both affordable and easy to deploy. We provide an overview of the method and baseline evaluation of the accuracy of object segmentation. The code is available at https://mahtaabdn.github.io/gaussian_heritage.github.io/.
Paper Structure (8 sections, 1 equation, 2 figures, 1 table)

This paper contains 8 sections, 1 equation, 2 figures, 1 table.

Figures (2)

  • Figure 1: Given a set of images, we a) use a web interface to upload them to a local server, where they are processed to generate b) 2D instance segmentation masks and c) a sparse 3D model. Using these, we train d) a model that captures appearance and 3D segmentation of the scene, from which we can e) extract a model for each object.
  • Figure 2: Sample view extracted from the "Family" scene of the Tanks and Temples dataset Knapitsch2017, using the labels a) "man statue" and b) "mother and baby statue".