Table of Contents
Fetching ...

Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction

Jordan Vice, Naveed Akhtar, Mubarak Shah, Richard Hartley, Ajmal Mian

TL;DR

This work tackles the challenge of safe image generation without semantically distorting the learned manifolds. It introduces an editing-free, context-preserving diffusion framework that leverages safe embeddings and a dual latent reconstruction with tunable safety, mediated by a global-context threshold $\\tau_{gc}$. By combining two latent streams corresponding to unsafe and safe prompts, the method achieves state-of-the-art safety on the I2P benchmark while preserving proximal semantic structure, as quantified by the SaDi index, and offers intuitive control over safety levels. The approach generalizes across SD1.4/2.1 and is evaluated on I2P and ViSU, with analyses of proximal-concept disruptions and second-order statistics demonstrating reduced semantic shift compared to editing-based methods, making it a practically impactful path for ethically aligned generative AI.

Abstract

Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs. While model editing has been proposed to remove or filter undesirable concepts in embedding and latent spaces, it can inadvertently damage learned manifolds, distorting concepts in close semantic proximity. We identify limitations in current model editing techniques, showing that even benign, proximal concepts may become misaligned. To address the need for safe content generation, we leverage safe embeddings and a modified diffusion process with tunable weighted summation in the latent space to generate safer images. Our method preserves global context without compromising the structural integrity of the learned manifolds. We achieve state-of-the-art results on safe image generation benchmarks and offer intuitive control over the level of model safety. We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models. We will release our code. Keywords: Text-to-Image Models, Generative AI, Safety, Reliability, Model Editing

Safety Without Semantic Disruptions: Editing-free Safe Image Generation via Context-preserving Dual Latent Reconstruction

TL;DR

This work tackles the challenge of safe image generation without semantically distorting the learned manifolds. It introduces an editing-free, context-preserving diffusion framework that leverages safe embeddings and a dual latent reconstruction with tunable safety, mediated by a global-context threshold . By combining two latent streams corresponding to unsafe and safe prompts, the method achieves state-of-the-art safety on the I2P benchmark while preserving proximal semantic structure, as quantified by the SaDi index, and offers intuitive control over safety levels. The approach generalizes across SD1.4/2.1 and is evaluated on I2P and ViSU, with analyses of proximal-concept disruptions and second-order statistics demonstrating reduced semantic shift compared to editing-based methods, making it a practically impactful path for ethically aligned generative AI.

Abstract

Training multimodal generative models on large, uncurated datasets can result in users being exposed to harmful, unsafe and controversial or culturally-inappropriate outputs. While model editing has been proposed to remove or filter undesirable concepts in embedding and latent spaces, it can inadvertently damage learned manifolds, distorting concepts in close semantic proximity. We identify limitations in current model editing techniques, showing that even benign, proximal concepts may become misaligned. To address the need for safe content generation, we leverage safe embeddings and a modified diffusion process with tunable weighted summation in the latent space to generate safer images. Our method preserves global context without compromising the structural integrity of the learned manifolds. We achieve state-of-the-art results on safe image generation benchmarks and offer intuitive control over the level of model safety. We identify trade-offs between safety and censorship, which presents a necessary perspective in the development of ethical AI models. We will release our code. Keywords: Text-to-Image Models, Generative AI, Safety, Reliability, Model Editing

Paper Structure

This paper contains 15 sections, 14 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: (Top) Using concept removal for safe image generation disrupts the learned manifolds, causing semantic misalignment. This can cause benign concepts in close proximity of removed concepts (e.g. red$\rightarrow$blood/violence) to generate highly-misaligned content, which we demonstrate using SafeCLIP Poppi2024. (Bottom) Our tunable, model editing-free method preserves manifold integrity while generating visually consistent, safe content.
  • Figure 2: (a) Typical text-to-image pipelines are susceptible to generating unsafe content when exposed to irresponsible prompts. (b) We introduce an editing-free, safe text-to-image pipeline that preserves global context and learned manifolds. We deploy an inappropriate content detector to identify the appropriate safety spectrum for incoming embeddings, which outputs a safe embedding that facilitates our safe guidance. Unsafe and safe embeddings are used in a piecewise reconstruction setup to guide the safe image reconstruction. Combining the two latents is necessary to preserve the global visual context while still generating safe content. (c) Our dual latent reconstruction process is editing-free and preserves the global visual context of the generated image. A model provider outlines safety protocols which are used to define labeled clusters within the text-encoder embedding space. The input detector determines the unsafe class (use I2P dataset Schramowski2023 labels) and thus, the appropriate safety spectrum. This decision informs the safe content guidance step. The irresponsible embedding is also retained in order to preserve the visual context of the generated image. Our piecewise denoising function is governed by a global context preservation threshold and similarity calculations. We deploy a combination of latents to retain global information in early denoising steps and remove unsafe content during later (local) denoising steps. Controllable hyperparameters enable effective control over the level of required safety.
  • Figure 3: Using unconditional spaces for unsafe concept removal causes semantic disruptions to proximal concepts. We visualize how removing 'violence' has resulted in semantic misalignment for "a chef slicing a piece of meat". We observe that the knife and meat are replaced, despite being a non-violent prompt.
  • Figure 4: Demonstration of hyper-parameter tuning. We visualize how different weight distributions impact safe image generation using our proposed modified latent reconstruction method.
  • Figure 5: Visualization of the evaluation methodology for (a) quantifying semantic disruptions in model editing methods, (b) using Q16 and NudeNet classifier Schramowski2022NudeNet2019 predictions to report safe image generation results. E and G denote encoder and generator.
  • ...and 11 more figures