Evaluation of Attention Mechanisms in U-Net Architectures for Semantic Segmentation of Brazilian Rock Art Petroglyphs
Leonardi Melo, Luís Gustavo, Dimmy Magalhães, Lucciani Vieira, Mauro Araújo
TL;DR
This study addresses semantic segmentation of rock art petroglyphs from the Poço da Bebidinha site in Brazil by comparing three BEGL-loss–driven U-Net variants. It evaluates BEGL-UNet (baseline), Attention-Residual BEGL-UNet, and SCA-BEGL-UNet using 5-fold cross-validation on 82 annotated images, with the BEGL loss combining edge-enhanced MSE and binary cross-entropy. The Attention-Residual BEGL-UNet achieves the best overall performance (DSC $=0.710$, val loss $=0.067$, recall $=0.854$), with SCA-BEGL-UNet close behind ($DSC $=$0.707$). These results demonstrate that incorporating attention mechanisms yields consistent improvements in segmentation quality for challenging archaeological imagery, supporting more reliable digital preservation workflows; however, generalization is limited by a single-site dataset, motivating future work with larger, diverse datasets and advanced architectures such as Vision Transformers.
Abstract
This study presents a comparative analysis of three U-Net-based architectures for semantic segmentation of rock art petroglyphs from Brazilian archaeological sites. The investigated architectures were: (1) BEGL-UNet with Border-Enhanced Gaussian Loss function; (2) Attention-Residual BEGL-UNet, incorporating residual blocks and gated attention mechanisms; and (3) Spatial Channel Attention BEGL-UNet, which employs spatial-channel attention modules based on Convolutional Block Attention Module. All implementations employed the BEGL loss function combining binary cross-entropy with Gaussian edge enhancement. Experiments were conducted on images from the Poço da Bebidinha Archaeological Complex, Piauí, Brazil, using 5-fold cross-validation. Among the architectures, Attention-Residual BEGL-UNet achieved the best overall performance with Dice Score of 0.710, validation loss of 0.067, and highest recall of 0.854. Spatial Channel Attention BEGL-UNet obtained comparable performance with DSC of 0.707 and recall of 0.857. The baseline BEGL-UNet registered DSC of 0.690. These results demonstrate the effectiveness of attention mechanisms for archaeological heritage digital preservation, with Dice Score improvements of 2.5-2.9% over the baseline.
