DBDH: A Dual-Branch Dual-Head Neural Network for Invisible Embedded Regions Localization
Chengxin Zhao, Hefei Ling, Sijing Xie, Nan Sun, Zongyi Li, Yuxuan Shi, Jiazhong Chen
TL;DR
DBDH tackles the localization challenge of invisible embedded regions in offline-to-online messaging by combining a low-level high-frequency texture analysis with a high-level context feature extractor. The dual-branch architecture feeds into a vertex-detection heatmap and a training-time segmentation mask, enabling precise and robust localization under print/shooting and screen/shooting distortions. Experimental results on WM-SS and WM-PIMoG datasets show state-of-the-art localization accuracy with low computational cost, validating the effectiveness of high-pass texture cues and region-wise supervision. This work enhances the reliability of decoding hidden messages by ensuring accurate first-stage localization, a critical step before geometric correction and retrieval.
Abstract
Embedding invisible hyperlinks or hidden codes in images to replace QR codes has become a hot topic recently. This technology requires first localizing the embedded region in the captured photos before decoding. Existing methods that train models to find the invisible embedded region struggle to obtain accurate localization results, leading to degraded decoding accuracy. This limitation is primarily because the CNN network is sensitive to low-frequency signals, while the embedded signal is typically in the high-frequency form. Based on this, this paper proposes a Dual-Branch Dual-Head (DBDH) neural network tailored for the precise localization of invisible embedded regions. Specifically, DBDH uses a low-level texture branch containing 62 high-pass filters to capture the high-frequency signals induced by embedding. A high-level context branch is used to extract discriminative features between the embedded and normal regions. DBDH employs a detection head to directly detect the four vertices of the embedding region. In addition, we introduce an extra segmentation head to segment the mask of the embedding region during training. The segmentation head provides pixel-level supervision for model learning, facilitating better learning of the embedded signals. Based on two state-of-the-art invisible offline-to-online messaging methods, we construct two datasets and augmentation strategies for training and testing localization models. Extensive experiments demonstrate the superior performance of the proposed DBDH over existing methods.
