Table of Contents
Fetching ...

TraDiffusion: Trajectory-Based Training-Free Image Generation

Mingrui Wu, Oucheng Huang, Jiayi Ji, Jiale Li, Xinyue Cai, Huafeng Kuang, Jianzhuang Liu, Xiaoshuai Sun, Rongrong Ji

TL;DR

A training-free, trajectory-based controllable T2I approach, termed TraDiffusion, that allows users to effortlessly guide image generation via mouse trajectories and showcases the ability to manipulate salient regions, attributes, and relationships within the generated images.

Abstract

In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion. This novel method allows users to effortlessly guide image generation via mouse trajectories. To achieve precise control, we design a distance awareness energy function to effectively guide latent variables, ensuring that the focus of generation is within the areas defined by the trajectory. The energy function encompasses a control function to draw the generation closer to the specified trajectory and a movement function to diminish activity in areas distant from the trajectory. Through extensive experiments and qualitative assessments on the COCO dataset, the results reveal that TraDiffusion facilitates simpler, more natural image control. Moreover, it showcases the ability to manipulate salient regions, attributes, and relationships within the generated images, alongside visual input based on arbitrary or enhanced trajectories.

TraDiffusion: Trajectory-Based Training-Free Image Generation

TL;DR

A training-free, trajectory-based controllable T2I approach, termed TraDiffusion, that allows users to effortlessly guide image generation via mouse trajectories and showcases the ability to manipulate salient regions, attributes, and relationships within the generated images.

Abstract

In this work, we propose a training-free, trajectory-based controllable T2I approach, termed TraDiffusion. This novel method allows users to effortlessly guide image generation via mouse trajectories. To achieve precise control, we design a distance awareness energy function to effectively guide latent variables, ensuring that the focus of generation is within the areas defined by the trajectory. The energy function encompasses a control function to draw the generation closer to the specified trajectory and a movement function to diminish activity in areas distant from the trajectory. Through extensive experiments and qualitative assessments on the COCO dataset, the results reveal that TraDiffusion facilitates simpler, more natural image control. Moreover, it showcases the ability to manipulate salient regions, attributes, and relationships within the generated images, alongside visual input based on arbitrary or enhanced trajectories.
Paper Structure (31 sections, 6 equations, 16 figures, 4 tables)

This paper contains 31 sections, 6 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Comparing the mask-conditioned method (a), box-conidtioned method (b) and our trajectory-conditioned method (c). The mask-conditioned method tends to have precise object shape control with a fine mask, which needs to be obtained by a specialized tool. The box-conidtioned methods enable coarse layout control. However, our trajectory-conditioned method provides a level of control granularity between the fine mask and the coarse box, which is user-friendly.
  • Figure 2: Overview of the distance awareness guidance. With the provided trajectories, we calculate distance matrices for each trajectory. Subsequently, we compute the distance awareness energy function between these distance matrices and the attention map of each object. Finally, during the inference process, we conduct backpropagation to optimize the latent code.
  • Figure 3: Examples of controlling the salient areas of the objects with trajectories. We can adjust the position of the local salient area of the object by enhancing the local trajectory.
  • Figure 4: Examples of controlling the object shapes with arbitrary trajectories. We can adjust the posture of the object (top) or specify the approximate shape of the object (bottom) by varying the given trajectory.
  • Figure 5: Examples of controlling the attribute and relationship of objects. Based on trajectories, we can overcome the attribute confusion issue of the pre-trained Stable Diffusion model, generating visual results consistent with the given prompt (a), and adjust the positions of interactions (b).
  • ...and 11 more figures