Table of Contents
Fetching ...

CoBELa: Steering Transparent Generation via Concept Bottlenecks on Energy Landscapes

Sangwon Kim, Kyoungoh Lee, Jeyoun Dong, Kwang-Ju Kim

TL;DR

CoBELa (Concept Bottlenecks on Energy Landscapes), a decoder-free, energy-based framework that eliminates non-explicit bottleneck representations by conditioning generation entirely through per-concept energy functions over the latent space of a frozen pretrained generator-requiring no generator retraining and enabling post-hoc interpretation.

Abstract

Generative concept bottleneck models aim to enable interpretable generation by routing synthesis through explicit, user-facing concepts. In practice, prior approaches often rely on non-explicit bottleneck representations (e.g., vision cues or opaque concept embeddings) or black-box decoders to preserve image quality, which weakens the transparency. We propose CoBELa (Concept Bottlenecks on Energy Landscapes), a decoder-free, energy-based framework that eliminates non-explicit bottleneck representations by conditioning generation entirely through per-concept energy functions over the latent space of a frozen pretrained generator-requiring no generator retraining and enabling post-hoc interpretation. Because these concept energies compose additively, CoBELa naturally supports compositional concept interventions: concept conjunction and negation are realized by summing or subtracting per-concept energy terms without additional training. A diffusion-scheduled energy guidance scheme further replaces expensive MCMC chains with more stable, scheduled denoising for efficient concept-steered sampling. Experiments on CelebA-HQ and CUB-200-2011 demonstrate improvements over prior concept bottleneck generative models, achieving 75.70%/82.42% concept accuracy and 6.47/5.37 FID, respectively, while enabling reliable multi-concept interventions.

CoBELa: Steering Transparent Generation via Concept Bottlenecks on Energy Landscapes

TL;DR

CoBELa (Concept Bottlenecks on Energy Landscapes), a decoder-free, energy-based framework that eliminates non-explicit bottleneck representations by conditioning generation entirely through per-concept energy functions over the latent space of a frozen pretrained generator-requiring no generator retraining and enabling post-hoc interpretation.

Abstract

Generative concept bottleneck models aim to enable interpretable generation by routing synthesis through explicit, user-facing concepts. In practice, prior approaches often rely on non-explicit bottleneck representations (e.g., vision cues or opaque concept embeddings) or black-box decoders to preserve image quality, which weakens the transparency. We propose CoBELa (Concept Bottlenecks on Energy Landscapes), a decoder-free, energy-based framework that eliminates non-explicit bottleneck representations by conditioning generation entirely through per-concept energy functions over the latent space of a frozen pretrained generator-requiring no generator retraining and enabling post-hoc interpretation. Because these concept energies compose additively, CoBELa naturally supports compositional concept interventions: concept conjunction and negation are realized by summing or subtracting per-concept energy terms without additional training. A diffusion-scheduled energy guidance scheme further replaces expensive MCMC chains with more stable, scheduled denoising for efficient concept-steered sampling. Experiments on CelebA-HQ and CUB-200-2011 demonstrate improvements over prior concept bottleneck generative models, achieving 75.70%/82.42% concept accuracy and 6.47/5.37 FID, respectively, while enabling reliable multi-concept interventions.

Paper Structure

This paper contains 16 sections, 11 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Concept bottleneck architectures for generation. ($\boldsymbol{+}$: concat.) (a) CBGM trains the generator end-to-end, concatenating concept scores with non-explicit bottleneck representations (concept embeddings and vision cues) into a single representation. (b) CB-AE freezes the generator but relies on an encoder--decoder and non-explicit bottleneck representations (vision cues) that bypass the bottleneck. (c) CoBELa eliminates both the decoder and non-explicit bottleneck representations; per-concept energies ($\Sigma$) and their gradient $\nabla_v E_\theta$ reconstruct the latent directly.
  • Figure 2: Overview of the CoBELa framework. A frozen pretrained generator is split into a mapping network $g_1$ and a synthesis network $g_2$. During training, the intermediate latent $v$ is noised and fed to the energy network $E_\theta$ along with learnable concept embeddings to produce per-concept scores (interpretable bottleneck) and energies, supervised by score-matching and concept losses. At inference, concept-weighted energy gradients guide DDIM ddim denoising to steer generation toward user-specified attributes.
  • Figure 3: Human-in-the-loop concept intervention on CelebA-HQ celeba-hq. Each row shows an original generation (left), the concept scores produced by the interpretable bottleneck (bar charts), and the results of user-specified interventions (right). By default, all $K$ concepts operate at positive weight $w^+$, forming an implicit conjunction that guides generation to reflect the full concept set. The bar charts serve as explicit, human-readable explanations of the current generation: a user can inspect which concepts are active and why the image looks as it does. To intervene, the user flips selected concepts to negative weight $w^-$ (negation, $\neg$). The key finding is that negating multiple concepts simultaneously remains reliable: targeted attributes change as expected while non-targeted scores and facial identity are preserved---demonstrating transparent, interpretable control grounded in explicit semantic explanations.
  • Figure 4: Reconstruction comparison on CUB Wah2011. (a) Original generation from StyleGAN2 stygan2, (b) reconstruction by the competing CB-AE cbae baseline, and (c) reconstruction by CoBELa. Our method better preserves semantic fidelity while reducing artifacts.