Table of Contents
Fetching ...

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

Yonghyun Park, Chieh-Hsin Lai, Satoshi Hayakawa, Yuhta Takida, Naoki Murata, Wei-Hsiang Liao, Woosung Choi, Kin Wai Cheuk, Junghyun Koo, Yuki Mitsufuji

TL;DR

This work introduces concept-level attribution through a novel method called Concept-TRAK, which extends influence functions with a key innovation: specialized training and utility loss functions designed to isolate concept-specific influences rather than overall reconstruction quality.

Abstract

While diffusion models excel at image generation, their growing adoption raises critical concerns about copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to specific elements, such as styles or objects, that are of primary concern to stakeholders. To address this gap, we introduce concept-level attribution through a novel method called Concept-TRAK, which extends influence functions with a key innovation: specialized training and utility loss functions designed to isolate concept-specific influences rather than overall reconstruction quality. We evaluate Concept-TRAK on novel concept attribution benchmarks using Synthetic and CelebA-HQ datasets, as well as the established AbC benchmark, showing substantial improvements over prior methods in concept-level attribution scenarios. We further demonstrate its versatility on real-world text-to-image generation with compositional and multi-concept prompts.

Concept-TRAK: Understanding how diffusion models learn concepts through concept-level attribution

TL;DR

This work introduces concept-level attribution through a novel method called Concept-TRAK, which extends influence functions with a key innovation: specialized training and utility loss functions designed to isolate concept-specific influences rather than overall reconstruction quality.

Abstract

While diffusion models excel at image generation, their growing adoption raises critical concerns about copyright issues and model transparency. Existing attribution methods identify training examples influencing an entire image, but fall short in isolating contributions to specific elements, such as styles or objects, that are of primary concern to stakeholders. To address this gap, we introduce concept-level attribution through a novel method called Concept-TRAK, which extends influence functions with a key innovation: specialized training and utility loss functions designed to isolate concept-specific influences rather than overall reconstruction quality. We evaluate Concept-TRAK on novel concept attribution benchmarks using Synthetic and CelebA-HQ datasets, as well as the established AbC benchmark, showing substantial improvements over prior methods in concept-level attribution scenarios. We further demonstrate its versatility on real-world text-to-image generation with compositional and multi-concept prompts.

Paper Structure

This paper contains 72 sections, 27 equations, 18 figures, 7 tables, 2 algorithms.

Figures (18)

  • Figure 1: (a) Traditional attribution methods like TRAK identify training samples that influenced an entire generated image, often yielding influences unrelated to specific concepts of interest. (b) Our Concept-TRAK identifies training samples that specifically influenced a targeted concept (e.g., "Pikachu"), enabling precise attribution for features of interest.
  • Figure 2: (a) Global concept attribution identifies training samples that influenced the learning of general concepts across all generations. (b) Local concept attribution identifies training samples that influenced the learning of specific concept manifestations appearing in a particular generated image. For example, when applying local concept attribution to the "dog" concept in a generated image of a bulldog-like dog, we can observe that it retrieves images similar to bulldogs, demonstrating more targeted attribution.
  • Figure 3: Experimental setup. (a) Train diffusion models on image–tuple pairs (shape, color), excluding all red–triangle combinations. (b) Generate ID/OOD samples and perform concept-level attribution; the prediction is correct if the top influential training samples contain the target concept.
  • Figure 4: The target concept for each method is indicated in parentheses (Shape/Color). A data attribution method succeeds when the top influential training samples contain the same concept as the generated sample. (a) In-distribution case: Both baseline methods and our approach successfully retrieve relevant training samples. (b) Out-of-distribution: Our method accurately retrieves training samples for each individual concept (triangle for shape, red for color), while baselines can only retrieve samples related to one concept due to image-level attribution limitations.
  • Figure 5: Precision@10 on synthetic dataset.
  • ...and 13 more figures