Deep Learning-based Text-in-Image Watermarking

Bishwa Karki; Chun-Hua Tsai; Pei-Chi Huang; Xin Zhong

Deep Learning-based Text-in-Image Watermarking

Bishwa Karki, Chun-Hua Tsai, Pei-Chi Huang, Xin Zhong

TL;DR

This work addresses the challenge of text-in-image watermarking by introducing a deep-learning framework that combines a Transformer-based encoder–decoder for text with Vision Transformer–based embedder/extractor modules to hide and retrieve text inside cover images. A two-phase training regime with embedding-noise augmentation and a composite loss, including $L_{total}$ and $L_1$ formulations, yields high text fidelity and image quality on COCO and Multi30K, evidenced by BLEU and SSIM metrics. The method demonstrates robustness to distortions such as rotation and blur, outperforms traditional DCT/DWT/SVD-based approaches and a GRU baseline, and highlights the value of end-to-end learning for adaptive watermarking. Collectively, the work advances text-in-image watermarking by enabling adaptive, robust, and imperceptible embedding with practical security implications.

Abstract

In this work, we introduce a novel deep learning-based approach to text-in-image watermarking, a method that embeds and extracts textual information within images to enhance data security and integrity. Leveraging the capabilities of deep learning, specifically through the use of Transformer-based architectures for text processing and Vision Transformers for image feature extraction, our method sets new benchmarks in the domain. The proposed method represents the first application of deep learning in text-in-image watermarking that improves adaptivity, allowing the model to intelligently adjust to specific image characteristics and emerging threats. Through testing and evaluation, our method has demonstrated superior robustness compared to traditional watermarking techniques, achieving enhanced imperceptibility that ensures the watermark remains undetectable across various image contents.

Deep Learning-based Text-in-Image Watermarking

TL;DR

and

formulations, yields high text fidelity and image quality on COCO and Multi30K, evidenced by BLEU and SSIM metrics. The method demonstrates robustness to distortions such as rotation and blur, outperforms traditional DCT/DWT/SVD-based approaches and a GRU baseline, and highlights the value of end-to-end learning for adaptive watermarking. Collectively, the work advances text-in-image watermarking by enabling adaptive, robust, and imperceptible embedding with practical security implications.

Abstract

Paper Structure (17 sections, 5 equations, 8 figures, 1 table)

This paper contains 17 sections, 5 equations, 8 figures, 1 table.

Introduction
Related Works
Image Watermarking and Deep Learning
Text-in-Image Watermarking
The Proposed Method
The Encoder and Decoder Networks
The Embedder and Extractor Networks
Training and Loss Functions
Encoder-Decoder Networks Pre-training
Training the entire Network
Experiments and Analysis
Dataset and Implementation Details
Training & Testing Results
Robustness
Comparative Analysis
...and 2 more sections

Figures (8)

Figure 1: Text-in-Image Watermarking.
Figure 2: Overview of the proposed method
Figure 3: Text Encoder-Decoder Pre-training
Figure 4: Pretraining loss
Figure 5: Entire network training loss
...and 3 more figures

Deep Learning-based Text-in-Image Watermarking

TL;DR

Abstract

Deep Learning-based Text-in-Image Watermarking

Authors

TL;DR

Abstract

Table of Contents

Figures (8)