Table of Contents
Fetching ...

Order Is Not Layout: Order-to-Space Bias in Image Generation

Yongkang Zhang, Zonglin Zhao, Yuechen Zhang, Fei Ding, Pei Li, Wenxuan Wang

TL;DR

It is shown that both targeted fine-tuning and early-stage intervention strategies can substantially reduce OTS, while preserving generation quality, and introduces OTS-Bench, which isolates order effects with paired prompts differing only in entity order.

Abstract

We study a systematic bias in modern image generation models: the mention order of entities in text spuriously determines spatial layout and entity--role binding. We term this phenomenon Order-to-Space Bias (OTS) and show that it arises in both text-to-image and image-to-image generation, often overriding grounded cues and causing incorrect layouts or swapped assignments. To quantify OTS, we introduce OTS-Bench, which isolates order effects with paired prompts differing only in entity order and evaluates models along two dimensions: homogenization and correctness. Experiments show that Order-to-Space Bias (OTS) is widespread in modern image generation models, and provide evidence that it is primarily data-driven and manifests during the early stages of layout formation. Motivated by this insight, we show that both targeted fine-tuning and early-stage intervention strategies can substantially reduce OTS, while preserving generation quality.

Order Is Not Layout: Order-to-Space Bias in Image Generation

TL;DR

It is shown that both targeted fine-tuning and early-stage intervention strategies can substantially reduce OTS, while preserving generation quality, and introduces OTS-Bench, which isolates order effects with paired prompts differing only in entity order.

Abstract

We study a systematic bias in modern image generation models: the mention order of entities in text spuriously determines spatial layout and entity--role binding. We term this phenomenon Order-to-Space Bias (OTS) and show that it arises in both text-to-image and image-to-image generation, often overriding grounded cues and causing incorrect layouts or swapped assignments. To quantify OTS, we introduce OTS-Bench, which isolates order effects with paired prompts differing only in entity order and evaluates models along two dimensions: homogenization and correctness. Experiments show that Order-to-Space Bias (OTS) is widespread in modern image generation models, and provide evidence that it is primarily data-driven and manifests during the early stages of layout formation. Motivated by this insight, we show that both targeted fine-tuning and early-stage intervention strategies can substantially reduce OTS, while preserving generation quality.
Paper Structure (56 sections, 8 equations, 12 figures, 19 tables)

This paper contains 56 sections, 8 equations, 12 figures, 19 tables.

Figures (12)

  • Figure 1: Order-to-Space Bias makes models treat mention order as a cue for spatial layout and role binding. (a) With neutral prompts, the first-mentioned entity tends to appear on the left, and under-specified edits are often applied to a default side. (b) When order conflicts with grounded cues, the model follows order instead, producing incorrect layouts or role inversions.
  • Figure 2: Overview of OTS-Bench. We evaluate OTS in four settings: T2I and I2I, each with homogenization and correctness.
  • Figure 3: Multi-agent pipeline for generating constrained test cases. Task Explorer proposes candidates, Insight Synthesizer validates real-world constraints, and Diversity Architect produces structured data.
  • Figure 4: Representative outputs for T2I homogenization labeling. Output 1/2 correspond to valid left--right layouts, while Output 3 illustrates invalid generations.
  • Figure 5: Representative T2I examples illustrating order-to-space effects. Row 1 shows correct spatial grounding. Row 2 demonstrates persistent left/right reversals even when spatial constraints are explicitly specified in the prompt. Row 3 contains invalid or unscorable cases excluded from evaluation.
  • ...and 7 more figures