Table of Contents
Fetching ...

VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model

Jinxiang Lai, Wenzhe Zhao, Zexin Lu, Hualei Zhang, Qinyu Yang, Rongwei Quan, Zhimin Li, Shuai Shao, Song Guo, Qinglin Lu

TL;DR

This work proposes VisionCreator-R1, a native visual generation agent with explicit reflection, together with a Reflection-Plan Co-Optimization (RPCO) training methodology, which yields a unified VisionCreator-R1 agent, which consistently outperforms Gemini2.5Pro on existing benchmarks and the VCR-bench covering single-image and multi-image tasks.

Abstract

Visual content generation has advanced from single-image to multi-image workflows, yet existing agents remain largely plan-driven and lack systematic reflection mechanisms to correct mid-trajectory visual errors. To address this limitation, we propose VisionCreator-R1, a native visual generation agent with explicit reflection, together with a Reflection-Plan Co-Optimization (RPCO) training methodology. Through extensive experiments and trajectory-level analysis, we uncover reflection-plan optimization asymmetry in reinforcement learning (RL): planning can be reliably optimized via plan rewards, while reflection learning is hindered by noisy credit assignment. Guided by this insight, our RPCO first trains on the self-constructed VCR-SFT dataset with reflection-strong single-image trajectories and planning-strong multi-image trajectories, then co-optimization on VCR-RL dataset via RL. This yields our unified VisionCreator-R1 agent, which consistently outperforms Gemini2.5Pro on existing benchmarks and our VCR-bench covering single-image and multi-image tasks.

VisionCreator-R1: A Reflection-Enhanced Native Visual-Generation Agentic Model

TL;DR

This work proposes VisionCreator-R1, a native visual generation agent with explicit reflection, together with a Reflection-Plan Co-Optimization (RPCO) training methodology, which yields a unified VisionCreator-R1 agent, which consistently outperforms Gemini2.5Pro on existing benchmarks and the VCR-bench covering single-image and multi-image tasks.

Abstract

Visual content generation has advanced from single-image to multi-image workflows, yet existing agents remain largely plan-driven and lack systematic reflection mechanisms to correct mid-trajectory visual errors. To address this limitation, we propose VisionCreator-R1, a native visual generation agent with explicit reflection, together with a Reflection-Plan Co-Optimization (RPCO) training methodology. Through extensive experiments and trajectory-level analysis, we uncover reflection-plan optimization asymmetry in reinforcement learning (RL): planning can be reliably optimized via plan rewards, while reflection learning is hindered by noisy credit assignment. Guided by this insight, our RPCO first trains on the self-constructed VCR-SFT dataset with reflection-strong single-image trajectories and planning-strong multi-image trajectories, then co-optimization on VCR-RL dataset via RL. This yields our unified VisionCreator-R1 agent, which consistently outperforms Gemini2.5Pro on existing benchmarks and our VCR-bench covering single-image and multi-image tasks.
Paper Structure (42 sections, 1 theorem, 21 equations, 7 figures, 5 tables)

This paper contains 42 sections, 1 theorem, 21 equations, 7 figures, 5 tables.

Key Result

Theorem 3.1

Consider the GRPO optimization objective and its stochastic gradient estimator for a single token generation step $t$ in trajectory $i$. Given state $s = (q, o_{i,<t})$ and sampled action $a = o_{i,t}$, the gradient estimator is where $\hat{A}_{i,t}$ is normalized advantage derived from trajectory-level reward, $\pi_\theta$ is the current policy, $\pi_{\mathrm{ref}}$ is a fixed reference policy,

Figures (7)

  • Figure 1: Comparison between without-Reflection, with Good-Reflection, Under-Reflection and Over-Reflection.
  • Figure 2: Performance - Reflection Quality.
  • Figure 3: (a) Our Native VisionCreator-R1 framework. (b) Reflection–Plan Co-Optimization (RPCO) Training Paradigm.
  • Figure 4: Task distribution of the VCR dataset.
  • Figure 5: VCR-SFT dataset distribution.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem 3.1: Structural Variance Asymmetry in Multi-Image GRPO