Deep Learning-based Text-in-Image Watermarking
Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong
TL;DR
This work addresses the challenge of text-in-image watermarking by introducing a deep-learning framework that combines a Transformer-based encoder–decoder for text with Vision Transformer–based embedder/extractor modules to hide and retrieve text inside cover images. A two-phase training regime with embedding-noise augmentation and a composite loss, including $L_{total}$ and $L_1$ formulations, yields high text fidelity and image quality on COCO and Multi30K, evidenced by BLEU and SSIM metrics. The method demonstrates robustness to distortions such as rotation and blur, outperforms traditional DCT/DWT/SVD-based approaches and a GRU baseline, and highlights the value of end-to-end learning for adaptive watermarking. Collectively, the work advances text-in-image watermarking by enabling adaptive, robust, and imperceptible embedding with practical security implications.
Abstract
In this work, we introduce a novel deep learning-based approach to text-in-image watermarking, a method that embeds and extracts textual information within images to enhance data security and integrity. Leveraging the capabilities of deep learning, specifically through the use of Transformer-based architectures for text processing and Vision Transformers for image feature extraction, our method sets new benchmarks in the domain. The proposed method represents the first application of deep learning in text-in-image watermarking that improves adaptivity, allowing the model to intelligently adjust to specific image characteristics and emerging threats. Through testing and evaluation, our method has demonstrated superior robustness compared to traditional watermarking techniques, achieving enhanced imperceptibility that ensures the watermark remains undetectable across various image contents.
