Table of Contents
Fetching ...

Authenticated Contradictions from Desynchronized Provenance and Watermarking

Alexander Nemecek, Hengzhi He, Guang Cheng, Erman Ayday

TL;DR

This work formalizes and empirically demonstrates the Integrity Clash, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation.

Abstract

Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the $\textit{Integrity Clash}$, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.

Authenticated Contradictions from Desynchronized Provenance and Watermarking

TL;DR

This work formalizes and empirically demonstrates the Integrity Clash, a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation.

Abstract

Cryptographic provenance standards such as C2PA and invisible watermarking are positioned as complementary defenses for content authentication, yet the two verification layers are technically independent: neither conditions on the output of the other. This work formalizes and empirically demonstrates the , a condition in which a digital asset carries a cryptographically valid C2PA manifest asserting human authorship while its pixels simultaneously carry a watermark identifying it as AI-generated, with both signals passing their respective verification checks in isolation. We construct metadata washing workflows that produce these authenticated fakes through standard editing pipelines, requiring no cryptographic compromise, only the semantic omission of a single assertion field permitted by the current C2PA specification. To close this gap, we propose a cross-layer audit protocol that jointly evaluates provenance metadata and watermark detection status, achieving 100% classification accuracy across 3,500 test images spanning four conflict-matrix states and three realistic perturbation conditions. Our results demonstrate that the gap between these verification layers is unnecessary and technically straightforward to close.
Paper Structure (20 sections, 4 figures, 3 tables)

This paper contains 20 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Cross-layer conflict matrix. Q4 splits into Q4a (Verified Synthetic), where the manifest discloses AI generation, and Q4b (Authenticated Fake), where it does not. The transition requires no cryptographic compromise, only semantic omission of the AI origin assertion.
  • Figure 2: Comparison of an honestly declared AI-generated image manifest (Q4a, top) and an authenticated fake (Q4b, bottom). The digitalSourceType field is entirely absent in the attack manifest, and the action and software agent are replaced with generic editing descriptors. The attack requires no cryptographic compromise, only the semantic omission of the AI origin assertion, which the C2PA specification does not mandate C2PA2026spec.
  • Figure 3: The same AI-generated, watermarked image as displayed by the Content Credentials Verify tool under three conditions: the original watermarked image with no manifest attached, reporting no content credential (left), honest AI-disclosure manifest (middle), correctly identifying the image as AI-generated, and misleading human-edited manifest (right), displaying the image as human-edited with no mention of AI involvement. The output is determined entirely by the attached metadata; watermark signals are not inspected. The issuer warning refers only to our research certificate and would not appear with a trusted credential.
  • Figure 4: Bit accuracy distributions across experimental conditions ($N=500$ per condition). The dashed line marks the 0.75 detection threshold. Baseline images (no watermark) cluster at chance level ($\sim$0.50), confirming no false detections. All watermarked conditions remain well above the threshold, with screenshot simulation producing the widest spread (min$=$0.906). No image crosses the detection boundary under any perturbation.