Data Redaction from Conditional Generative Models
Zhifeng Kong, Kamalika Chaudhuri
TL;DR
This work introduces a post-editing framework for data redaction in conditional generative models by distilling and modifying only the conditioning networks. It formalizes redacting harmful conditionals as replacing them with reference non-redacted conditionals, achievable exactly for simple discrete-label cases and approximately via distillation for continuous representations. The authors develop model-specific distillation strategies for DM-GAN text-to-image and DiffWave text-to-speech, achieving superior redaction quality and robustness while preserving generation quality and efficiency (e.g., hours instead of days of full retraining). They also discuss safety considerations, adversarial robustness, limitations, and future directions, demonstrating practical impact for safer, more controllable generative systems. Overall, the approach provides a scalable, universal method to constrain conditioned generation without compromising non-redacted outputs, with strong empirical results in vision and audio domains.
Abstract
Deep generative models are known to produce undesirable samples such as harmful content. Traditional mitigation methods include re-training from scratch, filtering, or editing; however, these are either computationally expensive or can be circumvented by third parties. In this paper, we take a different approach and study how to post-edit an already-trained conditional generative model so that it redacts certain conditionals that will, with high probability, lead to undesirable content. This is done by distilling the conditioning network in the models, giving a solution that is effective, efficient, controllable, and universal for a class of deep generative models. We conduct experiments on redacting prompts in text-to-image models and redacting voices in text-to-speech models. Our method is computationally light, leads to better redaction quality and robustness than baseline methods while still retaining high generation quality.
