Table of Contents
Fetching ...

Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs

Leonardi Melo, Luís Gustavo, Dimmy Magalhães, Lucciani Vieira, Mauro Araújo

TL;DR

This study addresses semantic segmentation of rock art petroglyphs from the Poço da Bebidinha site in Brazil by comparing three BEGL-loss–driven U-Net variants. It evaluates BEGL-UNet (baseline), Attention-Residual BEGL-UNet, and SCA-BEGL-UNet using 5-fold cross-validation on 82 annotated images, with the BEGL loss combining edge-enhanced MSE and binary cross-entropy. The Attention-Residual BEGL-UNet achieves the best overall performance (DSC $=0.710$, val loss $=0.067$, recall $=0.854$), with SCA-BEGL-UNet close behind ($DSC $=$0.707$). These results demonstrate that incorporating attention mechanisms yields consistent improvements in segmentation quality for challenging archaeological imagery, supporting more reliable digital preservation workflows; however, generalization is limited by a single-site dataset, motivating future work with larger, diverse datasets and advanced architectures such as Vision Transformers.

Abstract

This study presents a comparative analysis of three U-Net-based architectures for semantic segmentation of rock art petroglyphs from Brazilian archaeological sites. The investigated architectures were: (1) BEGL-UNet with Border-Enhanced Gaussian Loss function; (2) Attention-Residual BEGL-UNet, incorporating residual blocks and gated attention mechanisms; and (3) Spatial Channel Attention BEGL-UNet, which employs spatial-channel attention modules based on Convolutional Block Attention Module. All implementations employed the BEGL loss function combining binary cross-entropy with Gaussian edge enhancement. Experiments were conducted on images from the Poço da Bebidinha Archaeological Complex, Piauí, Brazil, using 5-fold cross-validation. Among the architectures, Attention-Residual BEGL-UNet achieved the best overall performance with Dice Score of 0.710, validation loss of 0.067, and highest recall of 0.854. Spatial Channel Attention BEGL-UNet obtained comparable performance with DSC of 0.707 and recall of 0.857. The baseline BEGL-UNet registered DSC of 0.690. These results demonstrate the effectiveness of attention mechanisms for archaeological heritage digital preservation, with Dice Score improvements of 2.5-2.9% over the baseline.

Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs

TL;DR

This study addresses semantic segmentation of rock art petroglyphs from the Poço da Bebidinha site in Brazil by comparing three BEGL-loss–driven U-Net variants. It evaluates BEGL-UNet (baseline), Attention-Residual BEGL-UNet, and SCA-BEGL-UNet using 5-fold cross-validation on 82 annotated images, with the BEGL loss combining edge-enhanced MSE and binary cross-entropy. The Attention-Residual BEGL-UNet achieves the best overall performance (DSC , val loss , recall ), with SCA-BEGL-UNet close behind (=). These results demonstrate that incorporating attention mechanisms yields consistent improvements in segmentation quality for challenging archaeological imagery, supporting more reliable digital preservation workflows; however, generalization is limited by a single-site dataset, motivating future work with larger, diverse datasets and advanced architectures such as Vision Transformers.

Abstract

This study presents a comparative analysis of three U-Net-based architectures for semantic segmentation of rock art petroglyphs from Brazilian archaeological sites. The investigated architectures were: (1) BEGL-UNet with Border-Enhanced Gaussian Loss function; (2) Attention-Residual BEGL-UNet, incorporating residual blocks and gated attention mechanisms; and (3) Spatial Channel Attention BEGL-UNet, which employs spatial-channel attention modules based on Convolutional Block Attention Module. All implementations employed the BEGL loss function combining binary cross-entropy with Gaussian edge enhancement. Experiments were conducted on images from the Poço da Bebidinha Archaeological Complex, Piauí, Brazil, using 5-fold cross-validation. Among the architectures, Attention-Residual BEGL-UNet achieved the best overall performance with Dice Score of 0.710, validation loss of 0.067, and highest recall of 0.854. Spatial Channel Attention BEGL-UNet obtained comparable performance with DSC of 0.707 and recall of 0.857. The baseline BEGL-UNet registered DSC of 0.690. These results demonstrate the effectiveness of attention mechanisms for archaeological heritage digital preservation, with Dice Score improvements of 2.5-2.9% over the baseline.

Paper Structure

This paper contains 29 sections, 13 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Representative Dataset Samples of Brazilian Petroglyphs. Images show morphological variations including geometric patterns (\ref{['fig:sample10']}), linear engravings (\ref{['fig:sample22']}), complex overlapped figures (\ref{['fig:sample45']}), anthropomorphic representations (\ref{['fig:sample67']}), and zoomorphic figures with differential erosion (\ref{['fig:sample78']}).
  • Figure 2: Data Augmentation Examples. Representative transformations applied to training set: (\ref{['fig:aug_original']}) original image, (\ref{['fig:aug_flip']}) horizontal flip transformation, (\ref{['fig:aug_elastic']}) elastic deformation with parameters alpha=120 and sigma=6.
  • Figure 3: Comparative Segmentation Results on Sample 15. Progressive improvement in edge definition and morphological accuracy across architectures: (\ref{['fig:begl_baseline']}) BEGL-UNet baseline, (\ref{['fig:attention_residual']}) Attention-Residual BEGL-UNet demonstrating superior boundary delineation, (\ref{['fig:sca_begl']}) Spatial Channel Attention BEGL-UNet with comparable edge precision.
  • Figure 4: Representative Failure Cases. Challenging conditions affecting segmentation performance across all three architectures: (\ref{['fig:failure_superposition']}) chronological superposition with multiple temporal phases of engraving creating ambiguous boundaries, (\ref{['fig:failure_minerals']}) mineral deposits and iron oxide crusts occluding petroglyphs, (\ref{['fig:failure_contrast']}) low radiometric contrast between engraving and gneiss substrate under uncontrolled illumination.