Table of Contents
Fetching ...

ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage

Daniela Ivanova, Marco Aversa, Paul Henderson, John Williamson

TL;DR

This paper tackles robust damage detection in analogue media for cultural heritage preservation by introducing ARTeFACT, a diverse dataset with pixel-accurate damage masks for 15 damage types across 10 materials and 4 content categories, plus textual descriptions. It performs an extensive benchmark of zero-shot, supervised, unsupervised, and text-guided segmentation methods—including SAM, SegFormer, UPerNet variants, DINOv2, and diffusion-based approaches—to evaluate cross-media generalization. The results reveal substantial generalization gaps: no method consistently detects damage across all media types, with SAM requiring impractical prompt engineering, supervised models struggling with multiclass tasks, and diffusion-based methods offering only limited precision. The work provides the first-of-its-kind, publicly available benchmark and taxonomy for damaged analogue media, highlighting the need for new, robust damage-detection pipelines in conservation practice.

Abstract

Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting degradation if the damage operator is known a priori, we show that they fail to robustly predict where the damage is even after supervised training; thus, reliable damage detection remains a challenge. Motivated by this, we introduce ARTeFACT, a dataset for damage detection in diverse types analogue media, with over 11,000 annotations covering 15 kinds of damage across various subjects, media, and historical provenance. Furthermore, we contribute human-verified text prompts describing the semantic contents of the images, and derive additional textual descriptions of the annotated damage. We evaluate CNN, Transformer, diffusion-based segmentation models, and foundation vision models in zero-shot, supervised, unsupervised and text-guided settings, revealing their limitations in generalising across media types. Our dataset is available at $\href{https://daniela997.github.io/ARTeFACT/}{https://daniela997.github.io/ARTeFACT/}$ as the first-of-its-kind benchmark for analogue media damage detection and restoration.

ARTeFACT: Benchmarking Segmentation Models on Diverse Analogue Media Damage

TL;DR

This paper tackles robust damage detection in analogue media for cultural heritage preservation by introducing ARTeFACT, a diverse dataset with pixel-accurate damage masks for 15 damage types across 10 materials and 4 content categories, plus textual descriptions. It performs an extensive benchmark of zero-shot, supervised, unsupervised, and text-guided segmentation methods—including SAM, SegFormer, UPerNet variants, DINOv2, and diffusion-based approaches—to evaluate cross-media generalization. The results reveal substantial generalization gaps: no method consistently detects damage across all media types, with SAM requiring impractical prompt engineering, supervised models struggling with multiclass tasks, and diffusion-based methods offering only limited precision. The work provides the first-of-its-kind, publicly available benchmark and taxonomy for damaged analogue media, highlighting the need for new, robust damage-detection pipelines in conservation practice.

Abstract

Accurately detecting and classifying damage in analogue media such as paintings, photographs, textiles, mosaics, and frescoes is essential for cultural heritage preservation. While machine learning models excel in correcting degradation if the damage operator is known a priori, we show that they fail to robustly predict where the damage is even after supervised training; thus, reliable damage detection remains a challenge. Motivated by this, we introduce ARTeFACT, a dataset for damage detection in diverse types analogue media, with over 11,000 annotations covering 15 kinds of damage across various subjects, media, and historical provenance. Furthermore, we contribute human-verified text prompts describing the semantic contents of the images, and derive additional textual descriptions of the annotated damage. We evaluate CNN, Transformer, diffusion-based segmentation models, and foundation vision models in zero-shot, supervised, unsupervised and text-guided settings, revealing their limitations in generalising across media types. Our dataset is available at as the first-of-its-kind benchmark for analogue media damage detection and restoration.

Paper Structure

This paper contains 25 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Examples from our dataset of damaged artwork, categorised by Material (rows 1 and 2) and Content (row 3). Annotation colours correspond to different types of damage. Note the diversity of media and content, and pixel-accurate damage masks.
  • Figure 2: Damage types found in our dataset, demonstrating the variety of shape, scale and severity of damage.
  • Figure 3: Overview of the prevalence and severity of different damage types in the Dataset.
  • Figure 4: Qualitative results for SAM at zero-shot damage segmentation. Top row shows initial segments from N prompts as predicted by SAM (unique colour per prompt), bottom row shows the segments after being assigned binary class (Clean or Damaged) via an oracle.
  • Figure 5: Qualitative comparison for supervised binary (top row) and multiclass (bottom row) damage segmentation.