T2IW: Joint Text to Image & Watermark Generation
An-An Liu, Guokai Zhang, Yuting Su, Ning Xu, Yongdong Zhang, Lanjun Wang
TL;DR
This work introduces T2IW, a joint text-to-image and watermark generation framework that embeds an invisible watermark into the generated image to enable traceability and security without sacrificing visual quality. It combines a three-phase pipeline (joint generation, image decoupling via a non-cooperative game, and an optimization strategy) with a U-Net backbone to produce a compound image $x_c$ from text $i_t$ and noise $i_z$, while ensuring recoverability of the revealed image $x_r$ and watermark $w_r$. The method leverages Shannon information theory and game-theoretic decoupling to balance information allocation between image content and watermark signals, and it is trained with attacks-in-the-loop to enhance robustness. Comprehensive experiments on RAT-GAN and AttnGAN across Oxford-102, CUB-birds, and MS-COCO show maintained image quality (IS/FID), strong watermark invisibility (PSNR/SSIM/LPIPS), and robust watermark reconstruction under varied post-processing attacks, demonstrating practical potential for traceability in AIGC pipelines.
Abstract
Recent developments in text-conditioned image generative models have revolutionized the production of realistic results. Unfortunately, this has also led to an increase in privacy violations and the spread of false information, which requires the need for traceability, privacy protection, and other security measures. However, existing text-to-image paradigms lack the technical capabilities to link traceable messages with image generation. In this study, we introduce a novel task for the joint generation of text to image and watermark (T2IW). This T2IW scheme ensures minimal damage to image quality when generating a compound image by forcing the semantic feature and the watermark signal to be compatible in pixels. Additionally, by utilizing principles from Shannon information theory and non-cooperative game theory, we are able to separate the revealed image and the revealed watermark from the compound image. Furthermore, we strengthen the watermark robustness of our approach by subjecting the compound image to various post-processing attacks, with minimal pixel distortion observed in the revealed watermark. Extensive experiments have demonstrated remarkable achievements in image quality, watermark invisibility, and watermark robustness, supported by our proposed set of evaluation metrics.
