Table of Contents
Fetching ...

Event-Customized Image Generation

Zhen Wang, Yilei Jiang, Dong Zheng, Jun Xiao, Long Chen

TL;DR

This work defines event-customized image generation from a single reference image and introduces FreeEvent, a training-free diffusion-based framework with two dedicated paths (entity switching and event transferring) to preserve complex events while generating new target entities. It further provides two benchmarks, SWiG-Event and Real-Event, to evaluate event fidelity, entity alignment, and image quality, showing state-of-the-art performance against several baselines. The approach enables plug-and-play combination with subject customization and demonstrates strong generalization to diverse scenes, while discussing trade-offs and practical considerations for real-world use. Overall, FreeEvent broadens customized image generation to complex multi-entity events without requiring per-action training data.

Abstract

Customized Image Generation, generating customized images with user-specified concepts, has raised significant attention due to its creativity and novelty. With impressive progress achieved in subject customization, some pioneer works further explored the customization of action and interaction beyond entity (i.e., human, animal, and object) appearance. However, these approaches only focus on basic actions and interactions between two entities, and their effects are limited by insufficient ''exactly same'' reference images. To extend customized image generation to more complex scenes for general real-world applications, we propose a new task: event-customized image generation. Given a single reference image, we define the ''event'' as all specific actions, poses, relations, or interactions between different entities in the scene. This task aims at accurately capturing the complex event and generating customized images with various target entities. To solve this task, we proposed a novel training-free event customization method: FreeEvent. Specifically, FreeEvent introduces two extra paths alongside the general diffusion denoising process: 1) Entity switching path: it applies cross-attention guidance and regulation for target entity generation. 2) Event transferring path: it injects the spatial feature and self-attention maps from the reference image to the target image for event generation. To further facilitate this new task, we collected two evaluation benchmarks: SWiG-Event and Real-Event. Extensive experiments and ablations have demonstrated the effectiveness of FreeEvent.

Event-Customized Image Generation

TL;DR

This work defines event-customized image generation from a single reference image and introduces FreeEvent, a training-free diffusion-based framework with two dedicated paths (entity switching and event transferring) to preserve complex events while generating new target entities. It further provides two benchmarks, SWiG-Event and Real-Event, to evaluate event fidelity, entity alignment, and image quality, showing state-of-the-art performance against several baselines. The approach enables plug-and-play combination with subject customization and demonstrates strong generalization to diverse scenes, while discussing trade-offs and practical considerations for real-world use. Overall, FreeEvent broadens customized image generation to complex multi-entity events without requiring per-action training data.

Abstract

Customized Image Generation, generating customized images with user-specified concepts, has raised significant attention due to its creativity and novelty. With impressive progress achieved in subject customization, some pioneer works further explored the customization of action and interaction beyond entity (i.e., human, animal, and object) appearance. However, these approaches only focus on basic actions and interactions between two entities, and their effects are limited by insufficient ''exactly same'' reference images. To extend customized image generation to more complex scenes for general real-world applications, we propose a new task: event-customized image generation. Given a single reference image, we define the ''event'' as all specific actions, poses, relations, or interactions between different entities in the scene. This task aims at accurately capturing the complex event and generating customized images with various target entities. To solve this task, we proposed a novel training-free event customization method: FreeEvent. Specifically, FreeEvent introduces two extra paths alongside the general diffusion denoising process: 1) Entity switching path: it applies cross-attention guidance and regulation for target entity generation. 2) Event transferring path: it injects the spatial feature and self-attention maps from the reference image to the target image for event generation. To further facilitate this new task, we collected two evaluation benchmarks: SWiG-Event and Real-Event. Extensive experiments and ablations have demonstrated the effectiveness of FreeEvent.
Paper Structure (21 sections, 6 equations, 14 figures, 3 tables)

This paper contains 21 sections, 6 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: The overview of pipeline. Given the reference image, the event customization is overall a general diffusion denoising process with two extra paths. 1) The entity switching path guides the generation of each target entity through cross-attention guidance and regulation 2) The event transferring path injects the spatial features and self-attention maps from the reference image to the denoising process. The final $z^G_0$ is then transformed back to target image $I^G$ by the decoder.
  • Figure 2: (a) The architecture of the U-Net layer. (b) The process of cross-attention guidance and regulation. (c) The process of spatial feature and self-attention injection.
  • Figure 3: Comparision of Event Customization. Different colors and numbers show the associations between reference entities and their corresponding target prompts.
  • Figure 4: Ablations of the proposed paths and the target prompt. The "guidance" and "regulation" denote the cross-attention guidance and cross-attention regulation in entity switching path, respectively. The "injection" denotes the event transferring path.
  • Figure 5: Results of Event-Subject Customization. Different colors and numbers show the associations between reference entities and their corresponding target prompts.
  • ...and 9 more figures