Table of Contents
Fetching ...

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

MiroMind Team, Song Bai, Lidong Bing, Carson Chen, Guanzheng Chen, Yuntao Chen, Zhe Chen, Ziyi Chen, Jifeng Dai, Xuan Dong, Wenhan Dou, Yue Deng, Yunjie Fu, Junqi Ge, Chenxia Han, Tammy Huang, Zhenhang Huang, Jerry Jiao, Shilei Jiang, Tianyu Jiao, Xiaoqi Jian, Lei Lei, Ruilin Li, Ryan Luo, Tiantong Li, Xiang Lin, Ziyuan Liu, Zhiqi Li, Jie Ni, Qiang Ren, Pax Sun, Shiqian Su, Chenxin Tao, Bin Wang, Hellen Wang, Haonan Wang, James Wang, Jin Wang, Jojo Wang, Letian Wang, Shizun Wang, Weizhi Wang, Zixuan Wang, Jinfan Xu, Sen Xing, Chenyu Yang, Hai Ye, Jiaheng Yu, Yue Yu, Muyan Zhong, Tianchen Zhao, Xizhou Zhu, Yanpeng Zhou, Yifan Zhang, Zhi Zhu

TL;DR

MiroThinker v1.0 introduces a three-dimensional scaling paradigm for open-source research agents that extends beyond model size and context length by adding interaction depth through sustained agent–environment feedback. The work presents a comprehensive data-and-training pipeline, including MultiDocQA synthesis, agentic trajectory creation, and a three-stage training regime (SFT, preference optimization, and RL with GRPO), all within a 256K context window that supports up to 600 tool calls per task. Empirical results across GAIA, HLE, BrowseComp, and BrowseComp-ZH show state-of-the-art performance among open-source agents and competitive proximity to GPT-5-high on several benchmarks, with clear evidence that longer, deeper interactions yield predictable performance gains. The findings establish interaction scaling as a practical and impactful axis for building next-generation, open research agents, offering a strong baseline and an extensible platform for future exploration of agentic intelligence.

Abstract

We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.

MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling

TL;DR

MiroThinker v1.0 introduces a three-dimensional scaling paradigm for open-source research agents that extends beyond model size and context length by adding interaction depth through sustained agent–environment feedback. The work presents a comprehensive data-and-training pipeline, including MultiDocQA synthesis, agentic trajectory creation, and a three-stage training regime (SFT, preference optimization, and RL with GRPO), all within a 256K context window that supports up to 600 tool calls per task. Empirical results across GAIA, HLE, BrowseComp, and BrowseComp-ZH show state-of-the-art performance among open-source agents and competitive proximity to GPT-5-high on several benchmarks, with clear evidence that longer, deeper interactions yield predictable performance gains. The findings establish interaction scaling as a practical and impactful axis for building next-generation, open research agents, offering a strong baseline and an extensible platform for future exploration of agentic intelligence.

Abstract

We present MiroThinker v1.0, an open-source research agent designed to advance tool-augmented reasoning and information-seeking capabilities. Unlike previous agents that only scale up model size or context length, MiroThinker explores interaction scaling at the model level, systematically training the model to handle deeper and more frequent agent-environment interactions as a third dimension of performance improvement. Unlike LLM test-time scaling, which operates in isolation and risks degradation with longer reasoning chains, interactive scaling leverages environment feedback and external information acquisition to correct errors and refine trajectories. Through reinforcement learning, the model achieves efficient interaction scaling: with a 256K context window, it can perform up to 600 tool calls per task, enabling sustained multi-turn reasoning and complex real-world research workflows. Across four representative benchmarks-GAIA, HLE, BrowseComp, and BrowseComp-ZH-the 72B variant achieves up to 81.9%, 37.7%, 47.1%, and 55.6% accuracy respectively, surpassing previous open-source agents and approaching commercial counterparts such as GPT-5-high. Our analysis reveals that MiroThinker benefits from interactive scaling consistently: research performance improves predictably as the model engages in deeper and more frequent agent-environment interactions, demonstrating that interaction depth exhibits scaling behaviors analogous to model size and context length. These findings establish interaction scaling as a third critical dimension for building next-generation open research agents, complementing model capacity and context windows.

Paper Structure

This paper contains 51 sections, 14 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Comparison of MiroThinker with state-of-the-art agents and agentic foundation models.
  • Figure 2: Overview of the MiroThinker v1.0 agent architecture. The framework integrates a structured tool interface, i.e., execution environment, file management, and information retrieval, with a simple recency-aware context management to support interactive scaling. On the right, an agentic trajectory example illustrates the recency-based context retention mechanism, where tool outputs from earlier turns are omitted to maintain context efficiency.
  • Figure 3: Overview of the data construction pipeline. Public datasets from platforms such as HuggingFace and GitHub are filtered and verified, while raw internet data are processed through knowledge graph generation and a data engine. The resulting QA pairs from both sources are then converted into agentic trajectories, forming the complete MiroVerse v1.0 dataset used for training MiroThinker v1.0.
  • Figure 4: Training dynamics of MiroThinker-v1.0-30B for GRPO Agentic RL. Since the RL environment is not exactly the same as the final evaluation environment, there will be slight differences in performance.
  • Figure 5: Illustration of interactive scaling. Reinforcement learning training leads to a substantial increase in the number and depth of agent–environment interactions, resulting in consistently improved task performance across benchmarks. All results are from MiroThinker-v1.0-30B.