KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu; Zonghui Li; Xinting Hu; Xinyu Ye; Xianfang Zeng; Gang Yu; Wenbo Zhu; Bernt Schiele; Ming-Hsuan Yang; Xu Yang

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

Yongliang Wu, Zonghui Li, Xinting Hu, Xinyu Ye, Xianfang Zeng, Gang Yu, Wenbo Zhu, Bernt Schiele, Ming-Hsuan Yang, Xu Yang

TL;DR

KRIS-Bench introduces a cognitively grounded benchmark for instruction-based image editing that assesses knowledge-based reasoning via a three-type knowledge taxonomy (Factual, Conceptual, Procedural), 22 tasks, and 1,267 annotated instances. It adds a Knowledge Plausibility metric with knowledge hints and human validation, enabling more reliable evaluation of real-world knowledge integration. Across 10 state-of-the-art systems, results reveal substantial gaps in knowledge-grounded editing, with procedural and domain-specific reasoning proving especially challenging. The benchmark provides a principled framework for advancing knowledge-centric image editing and highlights directions for future research in cognitively aligned, reasoning-aware editing systems.

Abstract

Recent advances in multi-modal generative models have enabled significant progress in instruction-based image editing. However, while these models produce visually plausible outputs, their capacity for knowledge-based reasoning editing tasks remains under-explored. In this paper, we introduce KRIS-Bench (Knowledge-based Reasoning in Image-editing Systems Benchmark), a diagnostic benchmark designed to assess models through a cognitively informed lens. Drawing from educational theory, KRIS-Bench categorizes editing tasks across three foundational knowledge types: Factual, Conceptual, and Procedural. Based on this taxonomy, we design 22 representative tasks spanning 7 reasoning dimensions and release 1,267 high-quality annotated editing instances. To support fine-grained evaluation, we propose a comprehensive protocol that incorporates a novel Knowledge Plausibility metric, enhanced by knowledge hints and calibrated through human studies. Empirical results on 10 state-of-the-art models reveal significant gaps in reasoning performance, highlighting the need for knowledge-centric benchmarks to advance the development of intelligent image editing systems.

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

TL;DR

Abstract

KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (36)