Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

Guandong Li; Zhaobin Chu

Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

Guandong Li, Zhaobin Chu

Abstract

Instruction-following image editing models are expected to modify only the specified region while keeping the rest of the image unchanged. However, in practice, we observe a pervasive phenomenon -- edit spillover: models alter semantically related but unspecified content outside the edit region. This raises a fundamental question -- does spillover reflect genuine implicit world understanding, or is it merely attention leakage? We propose EditSpilloverProbe, a systematic framework that repurposes edit spillover as a natural probe for world knowledge in image editing models. We introduce a spillover taxonomy (spatial, semantic, mixed, random), an automated detection-and-classification pipeline, and a benchmark dataset constructed from real-world Chinese text editing tasks, EditSpilloverBench. Systematic evaluation of 5 representative editing models reveals three core findings: (1) spillover rates vary dramatically across architectures, from 3.49% to 11.46%, with a 3.3x ratio; (2) absolute semantic spillover quantity reveals models' world understanding capability -- nano_banana produces the most semantic spillover (27.8 per image), while qwen_2511 has the most precise editing control but lower semantic spillover (16.3 per image), revealing a trade-off between editing control and world understanding; (3) spatial decay analysis shows spillover area density decays exponentially with distance, but the proportion of semantically relevant spillover remains constant (40%-58%), providing direct evidence that semantic spillover reflects genuine world understanding rather than spatial diffusion.

Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

Abstract

Paper Structure (29 sections, 6 equations, 3 figures, 8 tables)

This paper contains 29 sections, 6 equations, 3 figures, 8 tables.

Introduction
Related Work
World Models and Visual Reasoning
Image Editing Evaluation
Neural Network Probing
Method
Overview
Problem Formalization
Spillover Detection Pipeline
Spillover Classification
World Understanding Metrics
EditSpilloverBench
Data Source
Dataset Composition
Controlled Test Groups
...and 14 more sections

Figures (3)

Figure 1: Cross-model comparison radar chart. Models exhibit different trade-offs between spillover control (SSIM, low Spill%) and world understanding (WUS, Semantic Density).
Figure 2: Spillover area density decay with distance. All models exhibit exponential decay, with nano_banana showing the steepest decline (200$\times$ from near to far distance).
Figure 3: Semantic spillover proportion remains constant across distances (39%--58%), providing direct evidence that semantic spillover is not driven by spatial proximity.

Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

Abstract

Edit Spillover as a Probe: Do Image Editing Models Implicitly Understand World Relations?

Authors

Abstract

Table of Contents

Figures (3)