Table of Contents
Fetching ...

Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

Zhiyuan Chang, Mingyang Li, Junjie Wang, Yi Liu, Qing Wang, Yang Liu

TL;DR

Text-to-Image diffusion models frequently suffer from catastrophic-neglect, where prompts containing multiple objects fail to render one or more described items. The authors introduce Patcher, a two-stage, attention-guided repair pipeline that first identifies neglected objects and then regenerates the prompt via explicit feature enhancement (LLM-generated shape/color modifiers) and implicit feature enhancement (WordNet hyponyms) guided by cross-attention differences. Empirical results across Stable Diffusion v1.4/v1.5/v2.1 and three prompt datasets show consistent Correct Rate improvements of roughly 10.1%-16.3% over strong baselines, with ablations confirming that both explicit and implicit features contribute. The work provides a practical, automated approach to improve semantic alignment in multi-object prompts, highlighting attention differences as a useful signal for guided prompt refinement, while noting limitations around attribute neglect and iterative cost.

Abstract

Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines.

Repairing Catastrophic-Neglect in Text-to-Image Diffusion Models via Attention-Guided Feature Enhancement

TL;DR

Text-to-Image diffusion models frequently suffer from catastrophic-neglect, where prompts containing multiple objects fail to render one or more described items. The authors introduce Patcher, a two-stage, attention-guided repair pipeline that first identifies neglected objects and then regenerates the prompt via explicit feature enhancement (LLM-generated shape/color modifiers) and implicit feature enhancement (WordNet hyponyms) guided by cross-attention differences. Empirical results across Stable Diffusion v1.4/v1.5/v2.1 and three prompt datasets show consistent Correct Rate improvements of roughly 10.1%-16.3% over strong baselines, with ablations confirming that both explicit and implicit features contribute. The work provides a practical, automated approach to improve semantic alignment in multi-object prompts, highlighting attention differences as a useful signal for guided prompt refinement, while noting limitations around attribute neglect and iterative cost.

Abstract

Text-to-Image Diffusion Models (T2I DMs) have garnered significant attention for their ability to generate high-quality images from textual descriptions. However, these models often produce images that do not fully align with the input prompts, resulting in semantic inconsistencies. The most prominent issue among these semantic inconsistencies is catastrophic-neglect, where the images generated by T2I DMs miss key objects mentioned in the prompt. We first conduct an empirical study on this issue, exploring the prevalence of catastrophic-neglect, potential mitigation strategies with feature enhancement, and the insights gained. Guided by the empirical findings, we propose an automated repair approach named Patcher to address catastrophic-neglect in T2I DMs. Specifically, Patcher first determines whether there are any neglected objects in the prompt, and then applies attention-guided feature enhancement to these neglected objects, resulting in a repaired prompt. Experimental results on three versions of Stable Diffusion demonstrate that Patcher effectively repairs the issue of catastrophic-neglect, achieving 10.1%-16.3% higher Correct Rate in image generation compared to baselines.

Paper Structure

This paper contains 26 sections, 1 equation, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: Examples of catastrophic neglect in the generated images by T2I DMs, and the enhancement of explicit and implicit features.
  • Figure 2: The overview of Patcher. The procedure in the dashed box is executed only the first time.
  • Figure 3: Images generated by original prompts and repaired prompts.