T2IW: Joint Text to Image & Watermark Generation

An-An Liu; Guokai Zhang; Yuting Su; Ning Xu; Yongdong Zhang; Lanjun Wang

T2IW: Joint Text to Image & Watermark Generation

An-An Liu, Guokai Zhang, Yuting Su, Ning Xu, Yongdong Zhang, Lanjun Wang

TL;DR

This work introduces T2IW, a joint text-to-image and watermark generation framework that embeds an invisible watermark into the generated image to enable traceability and security without sacrificing visual quality. It combines a three-phase pipeline (joint generation, image decoupling via a non-cooperative game, and an optimization strategy) with a U-Net backbone to produce a compound image $x_c$ from text $i_t$ and noise $i_z$, while ensuring recoverability of the revealed image $x_r$ and watermark $w_r$. The method leverages Shannon information theory and game-theoretic decoupling to balance information allocation between image content and watermark signals, and it is trained with attacks-in-the-loop to enhance robustness. Comprehensive experiments on RAT-GAN and AttnGAN across Oxford-102, CUB-birds, and MS-COCO show maintained image quality (IS/FID), strong watermark invisibility (PSNR/SSIM/LPIPS), and robust watermark reconstruction under varied post-processing attacks, demonstrating practical potential for traceability in AIGC pipelines.

Abstract

Recent developments in text-conditioned image generative models have revolutionized the production of realistic results. Unfortunately, this has also led to an increase in privacy violations and the spread of false information, which requires the need for traceability, privacy protection, and other security measures. However, existing text-to-image paradigms lack the technical capabilities to link traceable messages with image generation. In this study, we introduce a novel task for the joint generation of text to image and watermark (T2IW). This T2IW scheme ensures minimal damage to image quality when generating a compound image by forcing the semantic feature and the watermark signal to be compatible in pixels. Additionally, by utilizing principles from Shannon information theory and non-cooperative game theory, we are able to separate the revealed image and the revealed watermark from the compound image. Furthermore, we strengthen the watermark robustness of our approach by subjecting the compound image to various post-processing attacks, with minimal pixel distortion observed in the revealed watermark. Extensive experiments have demonstrated remarkable achievements in image quality, watermark invisibility, and watermark robustness, supported by our proposed set of evaluation metrics.

T2IW: Joint Text to Image & Watermark Generation

TL;DR

from text

and noise

, while ensuring recoverability of the revealed image

and watermark

. The method leverages Shannon information theory and game-theoretic decoupling to balance information allocation between image content and watermark signals, and it is trained with attacks-in-the-loop to enhance robustness. Comprehensive experiments on RAT-GAN and AttnGAN across Oxford-102, CUB-birds, and MS-COCO show maintained image quality (IS/FID), strong watermark invisibility (PSNR/SSIM/LPIPS), and robust watermark reconstruction under varied post-processing attacks, demonstrating practical potential for traceability in AIGC pipelines.

Abstract

Paper Structure (38 sections, 22 equations, 10 figures, 4 tables)

This paper contains 38 sections, 22 equations, 10 figures, 4 tables.

Introduction
Related Work
Text-to-Image Generation
Single-stage Generation
Multi-stage Generation
Image watermarking
Watermarking on Real Images
Watermarking on Generated Images
Problem Statement
Methodology
Framework
Joint Generation
Image Decoupling
Optimization Strategy
Evaluation Metrics
...and 23 more sections

Figures (10)

Figure 1: Workflow of text-to-image, watermarking, and our proposed T2IW. (a) Text-to-image synthesizes a high-quality image conditioned on the input text. (b) Watermarking hides a watermark into a real image invisibly. (c) Text to image & watermark seeks to integrate a message-bearing watermark into the image generation procedure, thereby enabling the generation of the compound image and the decoding of the revealed watermark and the revealed image, even in the presence of attacks.
Figure 2: Overview of our proposed T2IW framework, compromising three main components. (a) The purpose of joint generation is to incorporate noise, text, and watermark signals to create a compound image, which essentially means generating an image with the invisible watermark. (b) Image decoupling utilizes the non-cooperative game theory to establish a pair of decoders and the information allocating strategies, enabling the decoupling of the image and watermark from a compound image. (c) The optimization strategy encompasses the objective functions for the revealed image, the revealed watermark and the compound image.
Figure 3: Illustration of the non-cooperative game on the T2IW scenario. (a) represents the schematic diagram of arbitrary spot $(\xi_{x_r},\xi_{w_r})$ on the hyperplane approaching the Nash equilibrium $(\xi^*_{x_r},\xi^*_{w_r})$, and (b) represents the strategy implementation during watermark revealing.
Figure 4: The curves for watermark robustness about T2IW on RAT-GAN under various post-processing attacks of different intensities, i.e, Gaussian noise, salt and pepper noise, rotation, random cropping, Gaussian blur and brightness.
Figure 5: Curves of parameter analysis about the feature coupling iteration number.
...and 5 more figures

T2IW: Joint Text to Image & Watermark Generation

TL;DR

Abstract

T2IW: Joint Text to Image & Watermark Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (10)