Table of Contents
Fetching ...

Fast, Secure, and High-Capacity Image Watermarking with Autoencoded Text Vectors

Gautier Evennou, Vivien Chappelier, Ewa Kijak

TL;DR

LatentSeal is proposed, which reframes watermarking as semantic communication: a lightweight text autoencoder maps full-sentence messages into a compact 256-dimensional unit-norm latent vector, which is robustly embedded by a finetuned watermark model and secured through a secret, invertible rotation.

Abstract

Most image watermarking systems focus on robustness, capacity, and imperceptibility while treating the embedded payload as meaningless bits. This bit-centric view imposes a hard ceiling on capacity and prevents watermarks from carrying useful information. We propose LatentSeal, which reframes watermarking as semantic communication: a lightweight text autoencoder maps full-sentence messages into a compact 256-dimensional unit-norm latent vector, which is robustly embedded by a finetuned watermark model and secured through a secret, invertible rotation. The resulting system hides full-sentence messages, decodes in real time, and survives valuemetric and geometric attacks. It surpasses prior state of the art in BLEU-4 and Exact Match on several benchmarks, while breaking through the long-standing 256-bit payload ceiling. It also introduces a statistically calibrated score that yields a ROC AUC score of 0.97-0.99, and practical operating points for deployment. By shifting from bit payloads to semantic latent vectors, LatentSeal enables watermarking that is not only robust and high-capacity, but also secure and interpretable, providing a concrete path toward provenance, tamper explanation, and trustworthy AI governance. Models, training and inference code, and data splits will be available upon publication.

Fast, Secure, and High-Capacity Image Watermarking with Autoencoded Text Vectors

TL;DR

LatentSeal is proposed, which reframes watermarking as semantic communication: a lightweight text autoencoder maps full-sentence messages into a compact 256-dimensional unit-norm latent vector, which is robustly embedded by a finetuned watermark model and secured through a secret, invertible rotation.

Abstract

Most image watermarking systems focus on robustness, capacity, and imperceptibility while treating the embedded payload as meaningless bits. This bit-centric view imposes a hard ceiling on capacity and prevents watermarks from carrying useful information. We propose LatentSeal, which reframes watermarking as semantic communication: a lightweight text autoencoder maps full-sentence messages into a compact 256-dimensional unit-norm latent vector, which is robustly embedded by a finetuned watermark model and secured through a secret, invertible rotation. The resulting system hides full-sentence messages, decodes in real time, and survives valuemetric and geometric attacks. It surpasses prior state of the art in BLEU-4 and Exact Match on several benchmarks, while breaking through the long-standing 256-bit payload ceiling. It also introduces a statistically calibrated score that yields a ROC AUC score of 0.97-0.99, and practical operating points for deployment. By shifting from bit payloads to semantic latent vectors, LatentSeal enables watermarking that is not only robust and high-capacity, but also secure and interpretable, providing a concrete path toward provenance, tamper explanation, and trustworthy AI governance. Models, training and inference code, and data splits will be available upon publication.

Paper Structure

This paper contains 61 sections, 5 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: LatentSeal overview illustrated through an image tampering detection application. Bob writes a message $m$ describing the image's content, then feeds it to the autoencoder which outputs a latent vector $y$. To introduce security, we apply a secret rotation to $y$ conditioned by a secret key and obtain $y_r$. LatentSeal embeds $y_r$ within the input image. During transmission, the image is intercepted and transformed by Eve. Finally, Alice receives the image, performs the watermark extraction process, and recovers an estimated rotated latent vector $\hat{y}_r$. Using her secret key, identical to Bob's, she applies the inverse rotation to $\hat{y}_r$ and decodes the resulting latent vector $\hat{y}$ to reconstruct the original textual message. By comparing the received image's content with the decoded message $\hat{m}_r$, Alice can detect discrepancies and flag tampering accordingly.
  • Figure 2: Text auto-encoder architecture. Encoder (left) outputs are fed as memory (key,value) to the MHA module of the decoder layers (right).
  • Figure 3: AE decoding speed vs LLMZip ; we report the number of tokens per second.
  • Figure 4: ROC of $\ell=-\log_{10}\rho$ on non-catastrophic transforms on Wikitext/PixMo-Cap.
  • Figure 5: We study train/test ngram overlap as an heuristic to determine how much the model is able to generalize on new data. We show that PixMo-Cap test set is quite repetitive and thus design a metric to optimize the hardness of the test set, keeping only the top hardest 5%.
  • ...and 5 more figures