DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization

Chengxin Zhao; Hefei Ling; Sijing Xie; Nan Sun; Zongyi Li; Yuxuan Shi; Jiazhong Chen

DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization

Chengxin Zhao, Hefei Ling, Sijing Xie, Nan Sun, Zongyi Li, Yuxuan Shi, Jiazhong Chen

TL;DR

DBDH tackles the localization challenge of invisible embedded regions in offline-to-online messaging by combining a low-level high-frequency texture analysis with a high-level context feature extractor. The dual-branch architecture feeds into a vertex-detection heatmap and a training-time segmentation mask, enabling precise and robust localization under print/shooting and screen/shooting distortions. Experimental results on WM-SS and WM-PIMoG datasets show state-of-the-art localization accuracy with low computational cost, validating the effectiveness of high-pass texture cues and region-wise supervision. This work enhances the reliability of decoding hidden messages by ensuring accurate first-stage localization, a critical step before geometric correction and retrieval.

Abstract

Embedding invisible hyperlinks or hidden codes in images to replace QR codes has become a hot topic recently. This technology requires first localizing the embedded region in the captured photos before decoding. Existing methods that train models to find the invisible embedded region struggle to obtain accurate localization results, leading to degraded decoding accuracy. This limitation is primarily because the CNN network is sensitive to low-frequency signals, while the embedded signal is typically in the high-frequency form. Based on this, this paper proposes a Dual-Branch Dual-Head (DBDH) neural network tailored for the precise localization of invisible embedded regions. Specifically, DBDH uses a low-level texture branch containing 62 high-pass filters to capture the high-frequency signals induced by embedding. A high-level context branch is used to extract discriminative features between the embedded and normal regions. DBDH employs a detection head to directly detect the four vertices of the embedding region. In addition, we introduce an extra segmentation head to segment the mask of the embedding region during training. The segmentation head provides pixel-level supervision for model learning, facilitating better learning of the embedded signals. Based on two state-of-the-art invisible offline-to-online messaging methods, we construct two datasets and augmentation strategies for training and testing localization models. Extensive experiments demonstrate the superior performance of the proposed DBDH over existing methods.

DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 6 figures, 2 tables)

This paper contains 15 sections, 3 equations, 6 figures, 2 tables.

Introduction
Related Work
Robust watermarking for offline-to-online messaging
Method
Low-level texture branch
High-level context branch
Vertex detection head
Region segmentation head
Training Strategy
Experiments
Datasets and augmentations
Experimental settings
Comparison with other methods
Ablation study
Conclusion

Figures (6)

Figure 1: The application pipeline of offline-to-online messaging in the print/screen-shooting scenario. The white dotted box indicates the embedded region.
Figure 2: Overall architecture of the proposed Dual-Branch Dual-Head (DBDH) network. Dual branches: the low-level texture branch uses fixed SRM and Gabor kernels to obtain high-frequency components of the augmented image, while the high-level context branch uses a ResNet18 to obtain the discriminative feature between the embedded and the normal region. Dual head: the vertex detection head detect the four vertices of the embedded region, and the region segmentation head outputs the embedded region's mask, which serves as an auxiliary supervision. Blocks with a black border indicate that their stride is 2.
Figure 3: Augmentation strategy for simulating the print-shooting process, which is called Aug-SS.
Figure 4: Post-processes of the WM-SS dataset. It improves the average PSNR of WM-SS to 40.06 dB.
Figure 5: Augmentation strategy for simulating the screen-shooting process, which is called Aug-PIMoG.
...and 1 more figures

DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization

TL;DR

Abstract

DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization

Authors

TL;DR

Abstract

Table of Contents

Figures (6)