Table of Contents
Fetching ...

MolMark: Safeguarding Molecular Structures through Learnable Atom-Level Watermarking

Runwen Hu, Peilin Chen, Keyan Ding, Shiqi Wang

TL;DR

<MolMark> introduces a deep learning-based watermarking framework for AI-generated molecules to protect provenance and IP while preserving molecular functionality. It uses an encoder/decoder pair and SE(3)-invariant features, with a dynamic training objective to balance watermark recoverability and chemical fidelity. Empirical results on QM9, GEOM-DRUG, and generative models show 16-bit watermarks with >95% extraction accuracy and minimal perturbation to physicochemical properties and docking performance. This work enables verifiable authorship and traceability for AI-driven molecular design, promoting trustworthy and accountable discovery.

Abstract

AI-driven molecular generation is reshaping drug discovery and materials design, yet the lack of protection mechanisms leaves AI-generated molecules vulnerable to unauthorized reuse and provenance ambiguity. Such limitation undermines both scientific reproducibility and intellectual property security. To address this challenge, we propose the first deep learning based watermarking framework for molecules (MolMark), which is exquisitely designed to embed high-fidelity digital signatures into molecules without compromising molecular functionalities. MolMark learns to modulate the chemically meaningful atom-level representations and enforce geometric robustness through SE(3)-invariant features, maintaining robustness under rotation, translation, and reflection. Additionally, MolMark integrates seamlessly with AI-based molecular generative models, enabling watermarking to be treated as a learned transformation with minimal interference to molecular structures. Experiments on benchmark datasets (QM9, GEOM-DRUG) and state-of-the-art molecular generative models (GeoBFN, GeoLDM) demonstrate that MolMark can embed 16-bit watermarks while retaining more than 90% of essential molecular properties, preserving downstream performance, and enabling >95% extraction accuracy under SE(3) transformations. MolMark establishes a principled pathway for unifying molecular generation with verifiable authorship, supporting trustworthy and accountable AI-driven molecular discovery.

MolMark: Safeguarding Molecular Structures through Learnable Atom-Level Watermarking

TL;DR

<MolMark> introduces a deep learning-based watermarking framework for AI-generated molecules to protect provenance and IP while preserving molecular functionality. It uses an encoder/decoder pair and SE(3)-invariant features, with a dynamic training objective to balance watermark recoverability and chemical fidelity. Empirical results on QM9, GEOM-DRUG, and generative models show 16-bit watermarks with >95% extraction accuracy and minimal perturbation to physicochemical properties and docking performance. This work enables verifiable authorship and traceability for AI-driven molecular design, promoting trustworthy and accountable discovery.

Abstract

AI-driven molecular generation is reshaping drug discovery and materials design, yet the lack of protection mechanisms leaves AI-generated molecules vulnerable to unauthorized reuse and provenance ambiguity. Such limitation undermines both scientific reproducibility and intellectual property security. To address this challenge, we propose the first deep learning based watermarking framework for molecules (MolMark), which is exquisitely designed to embed high-fidelity digital signatures into molecules without compromising molecular functionalities. MolMark learns to modulate the chemically meaningful atom-level representations and enforce geometric robustness through SE(3)-invariant features, maintaining robustness under rotation, translation, and reflection. Additionally, MolMark integrates seamlessly with AI-based molecular generative models, enabling watermarking to be treated as a learned transformation with minimal interference to molecular structures. Experiments on benchmark datasets (QM9, GEOM-DRUG) and state-of-the-art molecular generative models (GeoBFN, GeoLDM) demonstrate that MolMark can embed 16-bit watermarks while retaining more than 90% of essential molecular properties, preserving downstream performance, and enabling >95% extraction accuracy under SE(3) transformations. MolMark establishes a principled pathway for unifying molecular generation with verifiable authorship, supporting trustworthy and accountable AI-driven molecular discovery.

Paper Structure

This paper contains 22 sections, 22 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Application scenarios of MolMark in protecting molecules to tracking data leakage. Alice applies MolMark to embed watermarks into molecules and distributes uniquely watermarked molecules to different users. When Elaine leaks her copy to unauthorized users, Alice can detect the leakage and successfully trace it back to Elaine.
  • Figure 2: The framework of MolMark. The encoder $\mathcal{E}_\phi$ embeds watermarks into original molecules, generating the watermarked molecules with minimal impact on the molecular properties and functionalities. Molecular transformations are applied to simulate the real-world process on molecules. The decoder $\mathcal{D}_\theta$ effectively extracts the watermarks from the watermarked molecules, enabling reliable copyright protection.
  • Figure 3: The detailed structures of encoder $\mathcal{E}_\phi$, including the position processing module, atom embedder, edge embedder, cross processing module. The atom embedder and the edge embedder effectively utilize the atom-level features, ensuring the effectiveness of watermarked molecules.
  • Figure 4: The detailed structures of decoder $\mathcal{D}_\theta$, including the position processing module, atom embedder, edge embedder, and message extraction module. The atom embedder and edge embedder share the same structures as the counterparts in the encoder, but are trained with independent parameters.
  • Figure 5: Structure of eight pairs of molecules. The original molecule and watermarked molecules are arranged vertically, in which the structures only undergo slight changes after embedding watermarks. The structural differences are minimal, in which all RMSD values lower than 0.03 Å.
  • ...and 6 more figures