Table of Contents
Fetching ...

DreamRelation: Bridging Customization and Relation Generation

Qingyu Shi, Lu Qi, Jianzong Wu, Jinbin Bai, Jingbo Wang, Yunhai Tong, Xiangtai Li

TL;DR

Relation-aware customized image generation addresses the gap where identities from image prompts and relationships from text prompts must be jointly realized. DreamRelation disentangles identity and relation learning using a relation-aware data engine and two core modules: Keypoint Matching Loss and Local Token Injection, implemented via LoRA-tuned cross-attention and dense CLIP features. Evaluations on RelationBench, DreamBench, and Multi-object CustomConcept101 show improved relation fidelity and robust identity preservation, with ablations confirming the contributions of data engineering, pose supervision, and local-feature augmentation. The work offers a practical path to more controllable, personalized image synthesis and provides benchmarks and implementation details to spur future development, while acknowledging potential societal impacts and mitigation strategies.

Abstract

Customized image generation is essential for creating personalized content based on user prompts, allowing large-scale text-to-image diffusion models to more effectively meet individual needs. However, existing models often neglect the relationships between customized objects in generated images. In contrast, this work addresses this gap by focusing on relation-aware customized image generation, which seeks to preserve the identities from image prompts while maintaining the relationship specified in text prompts. Specifically, we introduce DreamRelation, a framework that disentangles identity and relation learning using a carefully curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Then, we propose two key modules to tackle the two main challenges: generating accurate and natural relationships, especially when significant pose adjustments are required, and avoiding object confusion in cases of overlap. First, we introduce a keypoint matching loss that effectively guides the model in adjusting object poses closely tied to their relationships. Second, we incorporate local features of the image prompts to better distinguish between objects, preventing confusion in overlapping cases. Extensive results on our proposed benchmarks demonstrate the superiority of DreamRelation in generating precise relations while preserving object identities across a diverse set of objects and relationships.

DreamRelation: Bridging Customization and Relation Generation

TL;DR

Relation-aware customized image generation addresses the gap where identities from image prompts and relationships from text prompts must be jointly realized. DreamRelation disentangles identity and relation learning using a relation-aware data engine and two core modules: Keypoint Matching Loss and Local Token Injection, implemented via LoRA-tuned cross-attention and dense CLIP features. Evaluations on RelationBench, DreamBench, and Multi-object CustomConcept101 show improved relation fidelity and robust identity preservation, with ablations confirming the contributions of data engineering, pose supervision, and local-feature augmentation. The work offers a practical path to more controllable, personalized image synthesis and provides benchmarks and implementation details to spur future development, while acknowledging potential societal impacts and mitigation strategies.

Abstract

Customized image generation is essential for creating personalized content based on user prompts, allowing large-scale text-to-image diffusion models to more effectively meet individual needs. However, existing models often neglect the relationships between customized objects in generated images. In contrast, this work addresses this gap by focusing on relation-aware customized image generation, which seeks to preserve the identities from image prompts while maintaining the relationship specified in text prompts. Specifically, we introduce DreamRelation, a framework that disentangles identity and relation learning using a carefully curated dataset. Our training data consists of relation-specific images, independent object images containing identity information, and text prompts to guide relation generation. Then, we propose two key modules to tackle the two main challenges: generating accurate and natural relationships, especially when significant pose adjustments are required, and avoiding object confusion in cases of overlap. First, we introduce a keypoint matching loss that effectively guides the model in adjusting object poses closely tied to their relationships. Second, we incorporate local features of the image prompts to better distinguish between objects, preventing confusion in overlapping cases. Extensive results on our proposed benchmarks demonstrate the superiority of DreamRelation in generating precise relations while preserving object identities across a diverse set of objects and relationships.

Paper Structure

This paper contains 21 sections, 8 equations, 22 figures, 10 tables.

Figures (22)

  • Figure 1: In our Relation-Aware Image Customization task, the generated images must accurately preserve the relationships between objects and maintain their identity. We highlight the limitations of previous approaches using three color codes: red indicates failure to capture relationships, blue marks missing objects, and orange represents object confusion. Each image is annotated to reflect its specific issue. Our results, highlighted by green boxes, demonstrate the advantages of our proposed method.
  • Figure 2: The overview of DreamRelation. DreamRelation utilizes the off-the-shelf identity extractor to decouple the relation and identity information in relation-specific images. After getting the U-Net output $\hat{\epsilon}$, we predict $\hat{z}_0$ and calculate the keypoint matching loss. The part in the dotted box is only for training.
  • Figure 3: The rigid object cropping data engine, which results in minimal changes to object pose, the cropped image prompts also contain relation information, leading to a copy-and-paste effect, where the text prompt is neglected. Our relation-aware data engine, on the other hand, focuses on relation learning by decoupling identity information from images.
  • Figure 4: Comparing our method with ReVersion across different base models, our approach demonstrates superior performance in the relation-aware generation task.
  • Figure 5: Additional results demonstrate the effectiveness of relation-aware generation. DreamRelation adapts the same pair of objects to different relationships in a natural and accurate manner.
  • ...and 17 more figures