Table of Contents
Fetching ...

Vision-Language Based Expert Reporting for Painting Authentication and Defect Detection

Eman Ouda, Mohammed Salah, Arsenii O. Chulkov, Gianfranco Gargiulo, Gian Luca Tartaglia, Stefano Sfarra, Yusra Abdulrahman

Abstract

Authenticity and condition assessment are central to conservation decision-making, yet interpretation and reporting of thermographic output remain largely bespoke and expert-dependent, complicating comparison across collections and limiting systematic integration into conservation documentation. Pulsed Active Infrared Thermography (AIRT) is sensitive to subsurface features such as material heterogeneity, voids, and past interventions; however, its broader adoption is constrained by artifact misinterpretation, inter-laboratory variability, and the absence of standardized, explainable reporting frameworks. Although multi-modal thermographic processing techniques are established, their integration with structured natural-language interpretation has not been explored in cultural heritage. A fully automated thermography-vision-language model (VLM) framework is presented. It combines multi-modal AIRT analysis with modality-aware textual reporting, without human intervention during inference. Thermal sequences are processed using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), and the resulting anomaly masks are fused into a consensus segmentation that emphasizes regions supported by multiple thermal indicators while mitigating boundary artifacts. The fused evidence is provided to a VLM, which generates structured reports describing the location of the anomaly, thermal behavior, and plausible physical interpretations while explicitly acknowledging the uncertainty and diagnostic limitations. Evaluation on two marquetries demonstrates consistent anomaly detection and stable structured interpretations, indicating reproducibility and generalizability across samples.

Vision-Language Based Expert Reporting for Painting Authentication and Defect Detection

Abstract

Authenticity and condition assessment are central to conservation decision-making, yet interpretation and reporting of thermographic output remain largely bespoke and expert-dependent, complicating comparison across collections and limiting systematic integration into conservation documentation. Pulsed Active Infrared Thermography (AIRT) is sensitive to subsurface features such as material heterogeneity, voids, and past interventions; however, its broader adoption is constrained by artifact misinterpretation, inter-laboratory variability, and the absence of standardized, explainable reporting frameworks. Although multi-modal thermographic processing techniques are established, their integration with structured natural-language interpretation has not been explored in cultural heritage. A fully automated thermography-vision-language model (VLM) framework is presented. It combines multi-modal AIRT analysis with modality-aware textual reporting, without human intervention during inference. Thermal sequences are processed using Principal Component Thermography (PCT), Thermographic Signal Reconstruction (TSR), and Pulsed Phase Thermography (PPT), and the resulting anomaly masks are fused into a consensus segmentation that emphasizes regions supported by multiple thermal indicators while mitigating boundary artifacts. The fused evidence is provided to a VLM, which generates structured reports describing the location of the anomaly, thermal behavior, and plausible physical interpretations while explicitly acknowledging the uncertainty and diagnostic limitations. Evaluation on two marquetries demonstrates consistent anomaly detection and stable structured interpretations, indicating reproducibility and generalizability across samples.
Paper Structure (13 sections, 8 equations, 5 figures, 2 tables)

This paper contains 13 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Optical images of the Italian marquetry samples analyzed in this study: (a) the "Boy" and (b) the "Girl" panel.
  • Figure 2: Active pulsed infrared thermography setup used for the acquisition of thermal sequences under conservation-safe conditions.
  • Figure 3: Thermography–VLM framework. Multi-modal thermographic maps are fused into a consensus anomaly map and provided, with the optical image, to a VLM that generates structured defect reports and quantitative summaries.
  • Figure 4: Proposed thermography–vision–language framework.
  • Figure 5: Illustrative example of a structured vision--language model (VLM) report generated for the "Boy" marquetry.