Table of Contents
Fetching ...

High-Fidelity Mural Restoration via a Unified Hybrid Mask-Aware Transformer

Jincheng Jiang, Qianhao Han, Chi Zhang, Zheng Zheng

Abstract

Ancient murals are valuable cultural artifacts, but many have suffered severe degradation due to environmental exposure, material aging, and human activity. Restoring these artworks is challenging because it requires both reconstructing large missing structures and strictly preserving authentic, undamaged regions. This paper presents the Hybrid Mask-Aware Transformer (HMAT), a unified framework for high-fidelity mural restoration. HMAT integrates Mask-Aware Dynamic Filtering for robust local texture modeling with a Transformer bottleneck for long-range structural inference. To further address the diverse morphology of degradation, we introduce a mask-conditional style fusion module that dynamically guides the generative process. In addition, a Teacher-Forcing Decoder with hard-gated skip connections is designed to enforce fidelity in valid regions and focus reconstruction on missing areas. We evaluate HMAT on the DHMural dataset and a curated Nine-Colored Deer dataset under varying degradation levels. Experimental results demonstrate that the proposed method achieves competitive performance compared to state-of-the-art approaches, while producing more structurally coherent and visually faithful restorations. These findings suggest that HMAT provides an effective solution for the digital restoration of cultural heritage murals.

High-Fidelity Mural Restoration via a Unified Hybrid Mask-Aware Transformer

Abstract

Ancient murals are valuable cultural artifacts, but many have suffered severe degradation due to environmental exposure, material aging, and human activity. Restoring these artworks is challenging because it requires both reconstructing large missing structures and strictly preserving authentic, undamaged regions. This paper presents the Hybrid Mask-Aware Transformer (HMAT), a unified framework for high-fidelity mural restoration. HMAT integrates Mask-Aware Dynamic Filtering for robust local texture modeling with a Transformer bottleneck for long-range structural inference. To further address the diverse morphology of degradation, we introduce a mask-conditional style fusion module that dynamically guides the generative process. In addition, a Teacher-Forcing Decoder with hard-gated skip connections is designed to enforce fidelity in valid regions and focus reconstruction on missing areas. We evaluate HMAT on the DHMural dataset and a curated Nine-Colored Deer dataset under varying degradation levels. Experimental results demonstrate that the proposed method achieves competitive performance compared to state-of-the-art approaches, while producing more structurally coherent and visually faithful restorations. These findings suggest that HMAT provides an effective solution for the digital restoration of cultural heritage murals.

Paper Structure

This paper contains 16 sections, 7 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of the proposed Hybrid Mask-Aware Transformer (HMAT). The core architecture is a unified generator featuring a Hybrid Encoder (MADF + Transformer) for robust feature extraction, Mask-Conditional Style Fusion (SF) to dynamically guide synthesis, and a Teacher-Forcing Decoder (TFD) to enforce absolute historical fidelity in undamaged regions. The resulting structural completion is subsequently processed by a Refinement Network to enhance high-frequency texture details.
  • Figure 2: Qualitative comparison of style dimensionality configurations on the Nine-Colored Deer dataset. To evaluate capacity distribution, we compare our Baseline ($s_{img}=360, s_{latent}=180, s_{mask}=64$) against Equal Capacity ($s_{img}=180, s_{latent}=180, s_{mask}=180$) and Heavy Semantic Bias ($s_{img}=360, s_{latent}=64, s_{mask}=16$).
  • Figure 3: Qualitative comparison with state-of-the-art methods on the DHMural dataset.