Table of Contents
Fetching ...

Mirror Diffusion Models for Constrained and Watermarked Generation

Guan-Horng Liu, Tianrong Chen, Evangelos A. Theodorou, Molei Tao

TL;DR

Mirror Diffusion Models (MDM) address the challenge of diffusion-based generation on convex constrained sets by mapping data through a mirror map $\nabla\phi$ to an unconstrained dual space, learning a standard Euclidean diffusion there, and mapping samples back with $\nabla\phi^*$. This yields simulation-free training, tractable dual-space scores, and exact constraint satisfaction by design. Empirically, MDM outperforms reflected-diffusion baselines on balls and simplices and enables watermarked generation with private tokens, maintaining high fidelity (FID/Metrics) while enforcing constraints. The approach broadens tractable diffusion to constrained domains and suggests new applications in watermarking and copyright protection, with potential extensions to other constraint families and integration with latent-flow methods.

Abstract

Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and $\ell_2$-balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains. Our code is available at https://github.com/ghliu/mdm

Mirror Diffusion Models for Constrained and Watermarked Generation

TL;DR

Mirror Diffusion Models (MDM) address the challenge of diffusion-based generation on convex constrained sets by mapping data through a mirror map to an unconstrained dual space, learning a standard Euclidean diffusion there, and mapping samples back with . This yields simulation-free training, tractable dual-space scores, and exact constraint satisfaction by design. Empirically, MDM outperforms reflected-diffusion baselines on balls and simplices and enables watermarked generation with private tokens, maintaining high fidelity (FID/Metrics) while enforcing constraints. The approach broadens tractable diffusion to constrained domains and suggests new applications in watermarking and copyright protection, with potential extensions to other constraint families and integration with latent-flow methods.

Abstract

Modern successes of diffusion models in learning complex, high-dimensional data distributions are attributed, in part, to their capability to construct diffusion processes with analytic transition kernels and score functions. The tractability results in a simulation-free framework with stable regression losses, from which reversed, generative processes can be learned at scale. However, when data is confined to a constrained set as opposed to a standard Euclidean space, these desirable characteristics appear to be lost based on prior attempts. In this work, we propose Mirror Diffusion Models (MDM), a new class of diffusion models that generate data on convex constrained sets without losing any tractability. This is achieved by learning diffusion processes in a dual space constructed from a mirror map, which, crucially, is a standard Euclidean space. We derive efficient computation of mirror maps for popular constrained sets, such as simplices and -balls, showing significantly improved performance of MDM over existing methods. For safety and privacy purposes, we also explore constrained sets as a new mechanism to embed invisible but quantitative information (i.e., watermarks) in generated data, for which MDM serves as a compelling approach. Our work brings new algorithmic opportunities for learning tractable diffusion on complex domains. Our code is available at https://github.com/ghliu/mdm
Paper Structure (14 sections, 33 equations, 18 figures, 1 table)

This paper contains 14 sections, 33 equations, 18 figures, 1 table.

Figures (18)

  • Figure 1: Mirror Diffusion Models (MDM) is a new class of diffusion models for convex constrained manifolds ${\cal M} \subseteq {\mathbb{R}}^d$. (left) Instead of learning score-approximate diffusions on ${\cal M}$, MDM applies a mirror map $\nabla\phi$ and learns tractable diffusions in its unconstrained dual-space$\nabla\phi({\cal M}) = {\mathbb{R}}^d$. (right) We also present MDM for watermarked generation, where generated contents (e.g., images) live in a high-dimensional token constrained set ${\cal M}$ that is certifiable only from the private user.
  • Figure 2: MDM for watermarked generation: (left) We first construct a constrained set ${\cal M}$ based on a set of user-defined tokens private to other users. (right) MDM can be instantiated by either learning the corresponding dual-space diffusions, or projecting pretrained, i.e., unwatermarked, diffusion models onto ${\cal M}$. In both cases, MDM embeds watermarks that are certifiable only from the user.
  • Figure 3: Examples of how mirror maps $\nabla\phi$ pushforward constrained distributions to unconstrained ones, for (left) an $\ell_2$-ball where ${\cal M} := \{ x \in {\mathbb{R}}^2: \|x\|_2^2 < R \}$ and (right) a simplex $\Delta_3$ where ${\cal M} := \{ x \in {\mathbb{R}}^{2} : \sum_{i=1}^{2} x_i \le 1, x_i \ge 0 \}$. Note that for simplex, $x_3=1-x_1-x_2$ is a redundant coordinate; see \ref{['sec:3.2']} for more details.
  • Figure 4: Comparison between $s_i$ induced by standard log-barriers vs. hyperbolic tangents in Eq. (\ref{['eq:scale']}).
  • Figure 5: Complexity of $\nabla\phi$ and $\nabla\phi^*$ for each constrained set.
  • ...and 13 more figures

Theorems & Definitions (1)

  • Remark 1