Table of Contents
Fetching ...

Data Redaction from Conditional Generative Models

Zhifeng Kong, Kamalika Chaudhuri

TL;DR

This work introduces a post-editing framework for data redaction in conditional generative models by distilling and modifying only the conditioning networks. It formalizes redacting harmful conditionals as replacing them with reference non-redacted conditionals, achievable exactly for simple discrete-label cases and approximately via distillation for continuous representations. The authors develop model-specific distillation strategies for DM-GAN text-to-image and DiffWave text-to-speech, achieving superior redaction quality and robustness while preserving generation quality and efficiency (e.g., hours instead of days of full retraining). They also discuss safety considerations, adversarial robustness, limitations, and future directions, demonstrating practical impact for safer, more controllable generative systems. Overall, the approach provides a scalable, universal method to constrain conditioned generation without compromising non-redacted outputs, with strong empirical results in vision and audio domains.

Abstract

Deep generative models are known to produce undesirable samples such as harmful content. Traditional mitigation methods include re-training from scratch, filtering, or editing; however, these are either computationally expensive or can be circumvented by third parties. In this paper, we take a different approach and study how to post-edit an already-trained conditional generative model so that it redacts certain conditionals that will, with high probability, lead to undesirable content. This is done by distilling the conditioning network in the models, giving a solution that is effective, efficient, controllable, and universal for a class of deep generative models. We conduct experiments on redacting prompts in text-to-image models and redacting voices in text-to-speech models. Our method is computationally light, leads to better redaction quality and robustness than baseline methods while still retaining high generation quality.

Data Redaction from Conditional Generative Models

TL;DR

This work introduces a post-editing framework for data redaction in conditional generative models by distilling and modifying only the conditioning networks. It formalizes redacting harmful conditionals as replacing them with reference non-redacted conditionals, achievable exactly for simple discrete-label cases and approximately via distillation for continuous representations. The authors develop model-specific distillation strategies for DM-GAN text-to-image and DiffWave text-to-speech, achieving superior redaction quality and robustness while preserving generation quality and efficiency (e.g., hours instead of days of full retraining). They also discuss safety considerations, adversarial robustness, limitations, and future directions, demonstrating practical impact for safer, more controllable generative systems. Overall, the approach provides a scalable, universal method to constrain conditioned generation without compromising non-redacted outputs, with strong empirical results in vision and audio domains.

Abstract

Deep generative models are known to produce undesirable samples such as harmful content. Traditional mitigation methods include re-training from scratch, filtering, or editing; however, these are either computationally expensive or can be circumvented by third parties. In this paper, we take a different approach and study how to post-edit an already-trained conditional generative model so that it redacts certain conditionals that will, with high probability, lead to undesirable content. This is done by distilling the conditioning network in the models, giving a solution that is effective, efficient, controllable, and universal for a class of deep generative models. We conduct experiments on redacting prompts in text-to-image models and redacting voices in text-to-speech models. Our method is computationally light, leads to better redaction quality and robustness than baseline methods while still retaining high generation quality.
Paper Structure (25 sections, 1 theorem, 16 equations, 28 figures, 8 tables)

This paper contains 25 sections, 1 theorem, 16 equations, 28 figures, 8 tables.

Key Result

Proposition 1

Let an invertible and smooth function $F:\mathbb{R}^d\rightarrow\mathbb{R}^d$ be an unconditional normalizing flow on the data space $\mathbb{R}^d$ that converts a standard Gaussian $\mathcal{N}$ to the output distribution $F_{\#}\mathcal{N}$. For any set $\mathcal{X}$ that has non-zero measure on $

Figures (28)

  • Figure 1: Redact "white belly" from text-to-image models zhu2019dm. The prompt is "this bird has feathers that are black and has a white belly". (a) Sample generated from the pre-trained model, which produces a visualization of the prompt. (b) The target sample that redacts "white belly" but keeps the other concepts. (c) Generated sample from our redaction model, which aims to redact "white belly" and approximates the reference sample. (d) Sample generated from the Rewriting baseline, which is blurry and has lower quality. More samples can be found in Appendix \ref{['appendix: text2img exp: vis']}.
  • Figure 2: Redacting labels 0,1,2,3 in cGAN on MNIST. Upper: samples generated from the pre-trained model. Down: samples generated from the redacted model. Redacted conditionals (first two rows) are edited as expected, and other conditionals (last three rows) remain unchanged.
  • Figure 3: High-level architecture of DM-GAN.
  • Figure 4: High-level architecture of original and higher-capacity conditioning networks of DM-GAN.
  • Figure 5: Redacted prompt: "this particular bird has a white belly and breasts and black head and back". Reference prompt: "this particular bird has a black belly and breasts and black head and back".
  • ...and 23 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof