Single-Model Attribution of Generative Models Through Final-Layer Inversion

Mike Laszkiewicz; Jonas Ricker; Johannes Lederer; Asja Fischer

Single-Model Attribution of Generative Models Through Final-Layer Inversion

Mike Laszkiewicz, Jonas Ricker, Johannes Lederer, Asja Fischer

TL;DR

The paper reframes single-model attribution in open-world settings as anomaly detection and introduces FLIPAD, which uses final-layer inversion to extract model-characteristic features and a convex, lasso-based optimization for efficient feature reconstruction. By combining these activations with DeepSAD-style anomaly detection, FLIPAD achieves high attribution accuracy across GANs, diffusion models, style-based generators, medical-imaging, and tabular data, without altering generator training. The authors provide theoretical recovery guarantees for the proposed inversion under random 2D-convolutions and demonstrate robustness to perturbations and cross-domain generalization, highlighting practical implications for IP protection and governance of generative models.

Abstract

Recent breakthroughs in generative modeling have sparked interest in practical single-model attribution. Such methods predict whether a sample was generated by a specific generator or not, for instance, to prove intellectual property theft. However, previous works are either limited to the closed-world setting or require undesirable changes to the generative model. We address these shortcomings by, first, viewing single-model attribution through the lens of anomaly detection. Arising from this change of perspective, we propose FLIPAD, a new approach for single-model attribution in the open-world setting based on final-layer inversion and anomaly detection. We show that the utilized final-layer inversion can be reduced to a convex lasso optimization problem, making our approach theoretically sound and computationally efficient. The theoretical findings are accompanied by an experimental study demonstrating the effectiveness of our approach and its flexibility to various domains.

Single-Model Attribution of Generative Models Through Final-Layer Inversion

TL;DR

Abstract

Paper Structure (58 sections, 12 theorems, 47 equations, 16 figures, 16 tables)

This paper contains 58 sections, 12 theorems, 47 equations, 16 figures, 16 tables.

Introduction
Problem Setup
Related Work
Fingerprinting
Inversion
Watermarking
Other Related Work
Methodology
Leveraging Anomaly Detection for Single-Model Attribution
Layer Inversion Reveals Model-Characteristic Features
Introducing FLIPAD
Experiments
Setup
Results
Feature Extraction
...and 43 more sections

Key Result

Proposition 4.4

Let $G_L : \mathbb{R}^{D_{L - 1}} \rightarrow \mathbb{R}^{D_L}$ be a surjective linear function and $o\in \mathbb{R}^{D_L}$. Furthermore, let $z_{L-1} \in \mathbb{R}^{D_{L-1}}$ be a solution to the linear system eq:linear_sytem. Then, every $z \in \mathcal{S}$ solves the linear system eq:linear_syte and $\operatorname{ker}(G_L):=\{ z \in \mathbb{R}^{D_{L-1}}:\; G_L(z)=0 \}$ defines the kernel of $

Figures (16)

Figure 1: Single-model attribution with FLIPAD. Given a generative model $G$, FLIPAD performs the following steps: 1) The training data includes generated samples $x_G^{(i)}$ from $G$ and samples from a different source $x^{(i)}_{G'}$. For each $x$ we compute the optimization target $o$ by inverting the final activation $\sigma_L$. 2) For each output $o$, we perform final-layer inversion by finding an activation $\hat{z}_{L-1}$ that is close to the expected activation $\bar{z}_{L-1}$ and an approximate solution to $o \approx G_L(\hat{z}_{L-1})$. 3) Since final-layer inversion reveals differences between different data sources, the activations can be used as features to train an anomaly detector.
Figure 2: The two dimensions of the model attribution problem. Let $A$ be the attribution method. While single-model attribution solves a binary decision problem ($G_0$ or something else?), multi-model attribution can distinguish between more than two classes. In an open-world setting, it is also possible that a sample stems from an unknown generator.
Figure 3: Geometry of the optimization problem (\ref{['eq:basis_pursuit']}). According to Proposition \ref{['prop:linear_algebra_fund']}, the solution sets $\mathcal{S}$ (yellow) and $\mathcal{S}^\prime$ (green) for outputs $o$ and $o^\prime$, respectively, are shifted versions of another. The solution $\hat{z}_{L-1}$ is the point where the smallest $\ell_1$-diamond (blue) around $\bar{z}_{L-1}$ touches the solution set $\mathcal{S}$. In particular, the second and third components of $\hat{z}_{L-1}$, i.e., the ones corresponding to the $y$- and $z$-axis, coincide with the component of $\bar{z}_{L-1}$.
Figure 4: Cherry-picked channel dimension $c$ of the average reconstructed features according to (\ref{['eq:opti_problem']}) when $G$ is a DCGAN. The left-most figure shows the average activation $\bar{z}_c$ over channel $c$, and the remaining figures show the average feature taken over DCGAN, real, WGAN-GP, LSGAN, and EBGAN samples, respectively.
Figure 5: Convolutional arithmetic as matrix multiplication. Each row shows one convolutional operation. Left: Conventional illustration of a 2D-convolution. Right: 2D-convolution as matrix multiplication $G\cdot x$.
...and 11 more figures

Theorems & Definitions (28)

Example 4.1: Impossible Output
Example 4.2: Unlikely Hidden Representation
Example 4.3: Structured Hidden Representation
Proposition 4.4: see e.g., Hefferon2012LinearA, Lemma 3.7
Theorem 4.5
proof
Definition 3.1
Definition 3.2
Theorem 3.3: Noisy Recovery candes_rip
Definition 3.4
...and 18 more

Single-Model Attribution of Generative Models Through Final-Layer Inversion

TL;DR

Abstract

Single-Model Attribution of Generative Models Through Final-Layer Inversion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (28)