Table of Contents
Fetching ...

ConceptWeaver: Weaving Disentangled Concepts with Flow

Jintao Chen, Aiming Hao, Xiaoqing Chen, Chengyu Bai, Chubin Chen, Yanxun Li, Jiahong Wu, Xiangxiang Chu, Shanghang Zhang

Abstract

Pre-trained flow-based models excel at synthesizing complex scenes yet lack a direct mechanism for disentangling and customizing their underlying concepts from one-shot real-world sources. To demystify this process, we first introduce a novel differential probing technique to isolate and analyze the influence of individual concept tokens on the velocity field over time. This investigation yields a critical insight: the generative process is not monolithic but unfolds in three distinct stages. An initial \textbf{Blueprint Stage} establishes low-frequency structure, followed by a pivotal \textbf{Instantiation Stage} where content concepts emerge with peak intensity and become naturally disentangled, creating an optimal window for manipulation. A final concept-insensitive refinement stage then synthesizes fine-grained details. Guided by this discovery, we propose \textbf{ConceptWeaver}, a framework for one-shot concept disentanglement. ConceptWeaver learns concept-specific semantic offsets from a single reference image using a stage-aware optimization strategy that aligns with the three-stage framework. These learned offsets are then deployed during inference via our novel ConceptWeaver Guidance (CWG) mechanism, which strategically injects them at the appropriate generative stage. Extensive experiments validate that ConceptWeaver enables high-fidelity, compositional synthesis and editing, demonstrating that understanding and leveraging the intrinsic, staged nature of flow models is key to unlocking precise, multi-granularity content manipulation.

ConceptWeaver: Weaving Disentangled Concepts with Flow

Abstract

Pre-trained flow-based models excel at synthesizing complex scenes yet lack a direct mechanism for disentangling and customizing their underlying concepts from one-shot real-world sources. To demystify this process, we first introduce a novel differential probing technique to isolate and analyze the influence of individual concept tokens on the velocity field over time. This investigation yields a critical insight: the generative process is not monolithic but unfolds in three distinct stages. An initial \textbf{Blueprint Stage} establishes low-frequency structure, followed by a pivotal \textbf{Instantiation Stage} where content concepts emerge with peak intensity and become naturally disentangled, creating an optimal window for manipulation. A final concept-insensitive refinement stage then synthesizes fine-grained details. Guided by this discovery, we propose \textbf{ConceptWeaver}, a framework for one-shot concept disentanglement. ConceptWeaver learns concept-specific semantic offsets from a single reference image using a stage-aware optimization strategy that aligns with the three-stage framework. These learned offsets are then deployed during inference via our novel ConceptWeaver Guidance (CWG) mechanism, which strategically injects them at the appropriate generative stage. Extensive experiments validate that ConceptWeaver enables high-fidelity, compositional synthesis and editing, demonstrating that understanding and leveraging the intrinsic, staged nature of flow models is key to unlocking precise, multi-granularity content manipulation.

Paper Structure

This paper contains 38 sections, 7 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: Compositional Synthesis with ConceptWeaver. Our framework learns a visual concept, like a shirt pattern, from a single reference image by optimizing a personalized semantic offset. This offset is then injected the concept into new scenes by our stage-aware CWG, enabling compositional synthesis across diverse contexts.
  • Figure 2: Probing Concept Formation Dynamics. While standard prompt guidance (a) provides a dense, entangled signal, our differential probing technique (b) isolates the dynamic influence of individual concepts. This analysis uncovers a consistent three-stage framework (c): an early Blueprint Stage dominated by structural concepts (e.g.,"riding a bike"), a pivotal mid-stage Instantiation Stage where content concepts (e.g., "panda") are decoupled and peak in intensity, and a final Refinement Stage where signals fade.
  • Figure 3: Flowchart of proposed ConceptWeaver. Our framework consists of two main phases: training and inference. (a) During training, a lightweight Semantic Offset Module learns to represent a visual concept by adding a learnable offset to the key/value pairs of its corresponding concept token. This is optimized with a stage-aware loss. (b) During inference, our ConceptWeaver Guidance (CWG) applies these offsets in a stage-aware manner: structural offsets are injected during the Blueprint Stage, and content offsets during the Instantiation Stage, enabling precise compositional control.
  • Figure 4: Mechanism of ConceptWeaver Guidance (CWG). Our guidance modifies the standard generative path (to $\boldsymbol{x}_0$) by injecting learned concept shifts (red vectors) at stage-aware intervals. This timed intervention, visualized by the heatmaps, steers the trajectory from noise ($\boldsymbol{x}_1$) to a customized target ($\boldsymbol{x}_0'$), enabling precise compositional control.
  • Figure 5: Comparison on multi-concept composition generation. The figure illustrates different methods' performance in combining multiple concepts, focusing on overfitting to original reference concepts and the ability to disentangle target concepts.
  • ...and 13 more figures