Automated Monitoring of Cultural Heritage Artifacts Using Semantic Segmentation
Andrea Ranieri, Giorgio Palmieri, Silvia Biasotti
TL;DR
This paper tackles automated crack segmentation on cultural heritage artifacts using semantic segmentation with U‑Net architectures and multiple CNN encoders. It leverages the OmniCrack30k dataset for training and quantitative evaluation and conducts an out‑of‑distribution qualitative test on real statues and monuments. ConvNeXt V2 Huge consistently delivers the strongest segmentation performance, with high generalization to unseen CH contexts, though at greater computational cost and with data augmentation sometimes reducing accuracy. The work highlights the need for CH‑focused datasets and proposes directions including 3D data, synthetic data, diffusion‑based domain adaptation, and closer collaboration with conservators.
Abstract
This paper addresses the critical need for automated crack detection in the preservation of cultural heritage through semantic segmentation. We present a comparative study of U-Net architectures, using various convolutional neural network (CNN) encoders, for pixel-level crack identification on statues and monuments. A comparative quantitative evaluation is performed on the test set of the OmniCrack30k dataset [1] using popular segmentation metrics including Mean Intersection over Union (mIoU), Dice coefficient, and Jaccard index. This is complemented by an out-of-distribution qualitative evaluation on an unlabeled test set of real-world cracked statues and monuments. Our findings provide valuable insights into the capabilities of different CNN- based encoders for fine-grained crack segmentation. We show that the models exhibit promising generalization capabilities to unseen cultural heritage contexts, despite never having been explicitly trained on images of statues or monuments.
