Table of Contents
Fetching ...

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

Rohit Gandikota, Joanna Materzynska, Tingrui Zhou, Antonio Torralba, David Bau

TL;DR

This work introduces Concept Sliders, a plug and play, low rank LoRA adaptor framework for diffusion models that enables precise, continuous, and composable control over textual and visual concepts. By optimizing low rank directions with a disentanglement objective and enabling inference time strength scaling, the method achieves targeted edits with reduced interference compared to prior approaches. The authors demonstrate textual and visual concept sliders, transfer from StyleGAN latents, and multi slider composition, while also showing practical benefits such as fixing hands and improving realism in SDXL outputs. Comprehensive experiments, ablations, and user studies support the usefulness and robustness of the approach, with open source code and sliders released to the public.

Abstract

We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either textual or visual concepts. Concept Sliders are plug-and-play: they can be composed efficiently and continuously modulated, enabling precise control over image generation. In quantitative experiments comparing to previous editing techniques, our sliders exhibit stronger targeted edits with lower interference. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We show how sliders can transfer latents from StyleGAN for intuitive editing of visual concepts for which textual description is difficult. We also find that our method can help address persistent quality issues in Stable Diffusion XL including repair of object deformations and fixing distorted hands. Our code, data, and trained sliders are available at https://sliders.baulab.info/

Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models

TL;DR

This work introduces Concept Sliders, a plug and play, low rank LoRA adaptor framework for diffusion models that enables precise, continuous, and composable control over textual and visual concepts. By optimizing low rank directions with a disentanglement objective and enabling inference time strength scaling, the method achieves targeted edits with reduced interference compared to prior approaches. The authors demonstrate textual and visual concept sliders, transfer from StyleGAN latents, and multi slider composition, while also showing practical benefits such as fixing hands and improving realism in SDXL outputs. Comprehensive experiments, ablations, and user studies support the usefulness and robustness of the approach, with open source code and sliders released to the public.

Abstract

We present a method to create interpretable concept sliders that enable precise control over attributes in image generations from diffusion models. Our approach identifies a low-rank parameter direction corresponding to one concept while minimizing interference with other attributes. A slider is created using a small set of prompts or sample images; thus slider directions can be created for either textual or visual concepts. Concept Sliders are plug-and-play: they can be composed efficiently and continuously modulated, enabling precise control over image generation. In quantitative experiments comparing to previous editing techniques, our sliders exhibit stronger targeted edits with lower interference. We showcase sliders for weather, age, styles, and expressions, as well as slider compositions. We show how sliders can transfer latents from StyleGAN for intuitive editing of visual concepts for which textual description is difficult. We also find that our method can help address persistent quality issues in Stable Diffusion XL including repair of object deformations and fixing distorted hands. Our code, data, and trained sliders are available at https://sliders.baulab.info/
Paper Structure (34 sections, 9 equations, 32 figures, 4 tables)

This paper contains 34 sections, 9 equations, 32 figures, 4 tables.

Figures (32)

  • Figure 1: Given a small set of text prompts or paired image data, our method identifies low-rank directions in diffusion parameter space for targeted concept control with minimal interference to other attributes. These directions can be derived from pairs of opposing textual concepts or artist-created images, and they are composable for complex multi-attribute control. We demonstrate the effectivness of our method by fixing distorted hands in Stable Diffusion outputs and transferring disentangled StyleGAN latents into diffusion models.
  • Figure 2: Concept Sliders are created by fine-tuning LoRA adaptors using a guided score that enhances attribute $c_+$ while suppressing attribute $c_-$ from the target concept $c_t$. The slider model generates samples $x_t$ by partially denoising Gaussian noise over time steps 1 to $t$, conditioned on the target concept $c_t$.
  • Figure 3: Our text-based sliders allow precise editing of desired attributes during image generation while maintaining the overall structure. Traversing the sliders towards the negative direction produces an opposing effect on the attributes.
  • Figure 4: Controlling fine-grained attributes like eyebrow shape and eye size using image pair-driven concept sliders with optional text guidance. The eye size slider scales from small to large eyes using the Ostris dataset ostris.
  • Figure 5: We demonstrate transferring StyleGAN style space latents to the diffusion latent space. We identify three neurons that edit facial structure: neuron 77 controls cheekbone structure, neuron 646 selectively adjusts the left side face width, and neuron 847 edits inter-ocular distance. We transfer these StyleGAN latents to the diffusion model to enable structured facial editing.
  • ...and 27 more figures