Table of Contents
Fetching ...

FontGuard: A Robust Font Watermarking Approach Leveraging Deep Font Knowledge

Kahim Wong, Jicheng Zhou, Kemou Li, Yain-Whar Si, Xiaowei Wu, Jiantao Zhou

TL;DR

FontGuard tackles the challenge of robust font-based watermarking for AI-generated text by integrating a deep font model to synthesize high-quality, diverse watermarked fonts through hidden style-feature perturbations. It couples a font-manifold–driven encoder with a CLIP-based decoder trained via language-guided contrastive learning, enabling reliable bit recovery under realistic distortions. A generalized variant, FontGuard-GEN, enables watermark generation for unseen fonts without retraining, by incorporating style prompts and a style-consistent loss. Empirical results show substantial improvements in decoding accuracy under synthetic, cross-media, and OSN distortions (+5.4%, +7.4%, +5.8%), while achieving notable visual-quality gains (LPIPS reduction by 52.7% relative to baselines) and strong generalization to unseen fonts. The approach thus offers scalable, practical font watermarking for copyright protection, provenance, and compliance in AI-generated text.

Abstract

The proliferation of AI-generated content brings significant concerns on the forensic and security issues such as source tracing, copyright protection, etc, highlighting the need for effective watermarking technologies. Font-based text watermarking has emerged as an effective solution to embed information, which could ensure copyright, traceability, and compliance of the generated text content. Existing font watermarking methods usually neglect essential font knowledge, which leads to watermarked fonts of low quality and limited embedding capacity. These methods are also vulnerable to real-world distortions, low-resolution fonts, and inaccurate character segmentation. In this paper, we introduce FontGuard, a novel font watermarking model that harnesses the capabilities of font models and language-guided contrastive learning. Unlike previous methods that focus solely on the pixel-level alteration, FontGuard modifies fonts by altering hidden style features, resulting in better font quality upon watermark embedding. We also leverage the font manifold to increase the embedding capacity of our proposed method by generating substantial font variants closely resembling the original font. Furthermore, in the decoder, we employ an image-text contrastive learning to reconstruct the embedded bits, which can achieve desirable robustness against various real-world transmission distortions. FontGuard outperforms state-of-the-art methods by +5.4%, +7.4%, and +5.8% in decoding accuracy under synthetic, cross-media, and online social network distortions, respectively, while improving the visual quality by 52.7% in terms of LPIPS. Moreover, FontGuard uniquely allows the generation of watermarked fonts for unseen fonts without re-training the network. The code and dataset are available at https://github.com/KAHIMWONG/FontGuard.

FontGuard: A Robust Font Watermarking Approach Leveraging Deep Font Knowledge

TL;DR

FontGuard tackles the challenge of robust font-based watermarking for AI-generated text by integrating a deep font model to synthesize high-quality, diverse watermarked fonts through hidden style-feature perturbations. It couples a font-manifold–driven encoder with a CLIP-based decoder trained via language-guided contrastive learning, enabling reliable bit recovery under realistic distortions. A generalized variant, FontGuard-GEN, enables watermark generation for unseen fonts without retraining, by incorporating style prompts and a style-consistent loss. Empirical results show substantial improvements in decoding accuracy under synthetic, cross-media, and OSN distortions (+5.4%, +7.4%, +5.8%), while achieving notable visual-quality gains (LPIPS reduction by 52.7% relative to baselines) and strong generalization to unseen fonts. The approach thus offers scalable, practical font watermarking for copyright protection, provenance, and compliance in AI-generated text.

Abstract

The proliferation of AI-generated content brings significant concerns on the forensic and security issues such as source tracing, copyright protection, etc, highlighting the need for effective watermarking technologies. Font-based text watermarking has emerged as an effective solution to embed information, which could ensure copyright, traceability, and compliance of the generated text content. Existing font watermarking methods usually neglect essential font knowledge, which leads to watermarked fonts of low quality and limited embedding capacity. These methods are also vulnerable to real-world distortions, low-resolution fonts, and inaccurate character segmentation. In this paper, we introduce FontGuard, a novel font watermarking model that harnesses the capabilities of font models and language-guided contrastive learning. Unlike previous methods that focus solely on the pixel-level alteration, FontGuard modifies fonts by altering hidden style features, resulting in better font quality upon watermark embedding. We also leverage the font manifold to increase the embedding capacity of our proposed method by generating substantial font variants closely resembling the original font. Furthermore, in the decoder, we employ an image-text contrastive learning to reconstruct the embedded bits, which can achieve desirable robustness against various real-world transmission distortions. FontGuard outperforms state-of-the-art methods by +5.4%, +7.4%, and +5.8% in decoding accuracy under synthetic, cross-media, and online social network distortions, respectively, while improving the visual quality by 52.7% in terms of LPIPS. Moreover, FontGuard uniquely allows the generation of watermarked fonts for unseen fonts without re-training the network. The code and dataset are available at https://github.com/KAHIMWONG/FontGuard.

Paper Structure

This paper contains 26 sections, 22 equations, 15 figures, 8 tables, 1 algorithm.

Figures (15)

  • Figure 1: An overview of text watermarking.
  • Figure 2: System model for our font-based text watermarking.
  • Figure 3: The watermark encoder/decoder in FontGuard. The model employs an end-to-end training approach, where the watermarking encoder and decoder are jointly trained alongside noise layers. The encoder consists of two main components: a well-trained font model and a learnable weight module. The font model provides a manifold for perturbed fonts, while the weight module encourages the encoder to generate appropriate fonts by learning. The decoder, based on a pre-trained CLIP model, is designed to maximize the similarity between a noisy, watermarked font image and its corresponding textual label.
  • Figure 4: The style-content disentanglement structure of font models
  • Figure 5: The behavior of Differential Binarization Filter
  • ...and 10 more figures