Table of Contents
Fetching ...

Identification of Stone Deterioration Patterns with Large Multimodal Models

Daniele Corradetti, Jose Delgado Rodrigues

TL;DR

The study addresses automated recognition of stone deterioration patterns in world heritage sites using large multimodal models, proposing a taxonomy-derived benchmarking framework and a 354-image test set. It evaluates three leading models—GPT-4omni, Claude 3 Opus, and Gemini 1.5 Pro—via a fine-tuning-free, prompt-based pipeline, revealing pattern-dependent variability and interpretability challenges. Target-pattern identification rates are modest (GPT-4omni 42.1%, Gemini 38.9%, Claude 24.3%), with open-ended identifications showing differing strengths but limited practical reliability, underscoring domain-specific gaps. The work provides a baseline for AI-assisted conservation, highlights the need for domain-focused training or architectural enhancements, and offers public data and code to enable replication and future improvements.

Abstract

The conservation of stone-based cultural heritage sites is a critical concern for preserving cultural and historical landmarks. With the advent of Large Multimodal Models, as GPT-4omni (OpenAI), Claude 3 Opus (Anthropic) and Gemini 1.5 Pro (Google), it is becoming increasingly important to define the operational capabilities of these models. In this work, we systematically evaluate the abilities of the main foundational multimodal models to recognise and classify anomalies and deterioration patterns of the stone elements that are useful in the practice of conservation and restoration of world heritage. After defining a taxonomy of the main stone deterioration patterns and anomalies, we asked the foundational models to identify a curated selection of 354 highly representative images of stone-built heritage, offering them a careful selection of labels to choose from. The result, which varies depending on the type of pattern, allowed us to identify the strengths and weaknesses of these models in the field of heritage conservation and restoration.

Identification of Stone Deterioration Patterns with Large Multimodal Models

TL;DR

The study addresses automated recognition of stone deterioration patterns in world heritage sites using large multimodal models, proposing a taxonomy-derived benchmarking framework and a 354-image test set. It evaluates three leading models—GPT-4omni, Claude 3 Opus, and Gemini 1.5 Pro—via a fine-tuning-free, prompt-based pipeline, revealing pattern-dependent variability and interpretability challenges. Target-pattern identification rates are modest (GPT-4omni 42.1%, Gemini 38.9%, Claude 24.3%), with open-ended identifications showing differing strengths but limited practical reliability, underscoring domain-specific gaps. The work provides a baseline for AI-assisted conservation, highlights the need for domain-focused training or architectural enhancements, and offers public data and code to enable replication and future improvements.

Abstract

The conservation of stone-based cultural heritage sites is a critical concern for preserving cultural and historical landmarks. With the advent of Large Multimodal Models, as GPT-4omni (OpenAI), Claude 3 Opus (Anthropic) and Gemini 1.5 Pro (Google), it is becoming increasingly important to define the operational capabilities of these models. In this work, we systematically evaluate the abilities of the main foundational multimodal models to recognise and classify anomalies and deterioration patterns of the stone elements that are useful in the practice of conservation and restoration of world heritage. After defining a taxonomy of the main stone deterioration patterns and anomalies, we asked the foundational models to identify a curated selection of 354 highly representative images of stone-built heritage, offering them a careful selection of labels to choose from. The result, which varies depending on the type of pattern, allowed us to identify the strengths and weaknesses of these models in the field of heritage conservation and restoration.
Paper Structure (14 sections, 5 figures, 2 tables)

This paper contains 14 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1.1: Implemented workflow followed in this study.
  • Figure 2.1: Example of the deterioration patterns used for benchmarking
  • Figure 3.1: Rate of success of each model on the "Target deterioration pattern". Yellow is GPT-4omni, light blue Gemini 1.5 Pro and blue is Claude 3 Opus
  • Figure 3.2: Success rate in the identification of the presence of a deterioration pattern (as openly chosen by the models). In the image the color yellow represents the model GPT-4omni, the color light blue represents Gemini 1.5 Pro and the blue is for Claude 3 Opus.
  • Figure 3.3: Example of additional queries to visualise the answers reproducibility.