Table of Contents
Fetching ...

Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation

Andrea Ranieri, Giorgio Palmieri, Silvia Biasotti

TL;DR

This paper tackles automated crack segmentation on cultural heritage artifacts using semantic segmentation with U‑Net architectures and multiple CNN encoders. It leverages the OmniCrack30k dataset for training and quantitative evaluation and conducts an out‑of‑distribution qualitative test on real statues and monuments. ConvNeXt V2 Huge consistently delivers the strongest segmentation performance, with high generalization to unseen CH contexts, though at greater computational cost and with data augmentation sometimes reducing accuracy. The work highlights the need for CH‑focused datasets and proposes directions including 3D data, synthetic data, diffusion‑based domain adaptation, and closer collaboration with conservators.

Abstract

This paper addresses the critical need for automated crack detection in the preservation of cultural heritage through semantic segmentation. We present a comparative study of U-Net architectures, using various convolutional neural network (CNN) encoders, for pixel-level crack identification on statues and monuments. A comparative quantitative evaluation is performed on the test set of the OmniCrack30k dataset [1] using popular segmentation metrics including Mean Intersection over Union (mIoU), Dice coefficient, and Jaccard index. This is complemented by an out-of-distribution qualitative evaluation on an unlabeled test set of real-world cracked statues and monuments. Our findings provide valuable insights into the capabilities of different CNN- based encoders for fine-grained crack segmentation. We show that the models exhibit promising generalization capabilities to unseen cultural heritage contexts, despite never having been explicitly trained on images of statues or monuments.

Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation

TL;DR

This paper tackles automated crack segmentation on cultural heritage artifacts using semantic segmentation with U‑Net architectures and multiple CNN encoders. It leverages the OmniCrack30k dataset for training and quantitative evaluation and conducts an out‑of‑distribution qualitative test on real statues and monuments. ConvNeXt V2 Huge consistently delivers the strongest segmentation performance, with high generalization to unseen CH contexts, though at greater computational cost and with data augmentation sometimes reducing accuracy. The work highlights the need for CH‑focused datasets and proposes directions including 3D data, synthetic data, diffusion‑based domain adaptation, and closer collaboration with conservators.

Abstract

This paper addresses the critical need for automated crack detection in the preservation of cultural heritage through semantic segmentation. We present a comparative study of U-Net architectures, using various convolutional neural network (CNN) encoders, for pixel-level crack identification on statues and monuments. A comparative quantitative evaluation is performed on the test set of the OmniCrack30k dataset [1] using popular segmentation metrics including Mean Intersection over Union (mIoU), Dice coefficient, and Jaccard index. This is complemented by an out-of-distribution qualitative evaluation on an unlabeled test set of real-world cracked statues and monuments. Our findings provide valuable insights into the capabilities of different CNN- based encoders for fine-grained crack segmentation. We show that the models exhibit promising generalization capabilities to unseen cultural heritage contexts, despite never having been explicitly trained on images of statues or monuments.

Paper Structure

This paper contains 20 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Out-of-distribution predictions obtained with the ConvNeXt V2 Huge U-Net model (no data augmentation regime) on images of black and white marble (therefore closer to images in the training set).
  • Figure 2: The same images as in Fig. \ref{['fig:ConvNext-Huge-1-predictions-on-material']} processed with the other three fine-tuned models (no data augmentation regime as before).
  • Figure 3: Out-of-distribution predictions obtained with the ConvNeXt V2 Huge U-Net model (no data augmentation regime) on images depicting statues (therefore quite distant from images in the training set).
  • Figure 4: The same images as in Fig. \ref{['fig:ConvNext-Huge-1-predictions-on-statues']} processed with the other three fine-tuned models (no data augmentation regime as before).