MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

MiroMind Team; S. Bai; L. Bing; L. Lei; R. Li; X. Li; X. Lin; E. Min; L. Su; B. Wang; L. Wang; L. Wang; S. Wang; X. Wang; Y. Zhang; Z. Zhang; G. Chen; L. Chen; Z. Cheng; Y. Deng; Z. Huang; D. Ng; J. Ni; Q. Ren; X. Tang; B. L. Wang; H. Wang; N. Wang; C. Wei; Q. Wu; J. Xia; Y. Xiao; H. Xu; X. Xu; C. Xue; Z. Yang; Z. Yang; F. Ye; H. Ye; J. Yu; C. Zhang; W. Zhang; H. Zhao; P. Zhu

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

MiroMind Team, S. Bai, L. Bing, L. Lei, R. Li, X. Li, X. Lin, E. Min, L. Su, B. Wang, L. Wang, L. Wang, S. Wang, X. Wang, Y. Zhang, Z. Zhang, G. Chen, L. Chen, Z. Cheng, Y. Deng, Z. Huang, D. Ng, J. Ni, Q. Ren, X. Tang, B. L. Wang, H. Wang, N. Wang, C. Wei, Q. Wu, J. Xia, Y. Xiao, H. Xu, X. Xu, C. Xue, Z. Yang, Z. Yang, F. Ye, H. Ye, J. Yu, C. Zhang, W. Zhang, H. Zhao, P. Zhu

Abstract

We present MiroThinker-1.7, a new research agent designed for complex long-horizon reasoning tasks. Building on this foundation, we further introduce MiroThinker-H1, which extends the agent with heavy-duty reasoning capabilities for more reliable multi-step problem solving. In particular, MiroThinker-1.7 improves the reliability of each interaction step through an agentic mid-training stage that emphasizes structured planning, contextual reasoning, and tool interaction. This enables more effective multi-step interaction and sustained reasoning across complex tasks. MiroThinker-H1 further incorporates verification directly into the reasoning process at both local and global levels. Intermediate reasoning decisions can be evaluated and refined during inference, while the overall reasoning trajectory is audited to ensure that final answers are supported by coherent chains of evidence. Across benchmarks covering open-web research, scientific reasoning, and financial analysis, MiroThinker-H1 achieves state-of-the-art performance on deep research tasks while maintaining strong results on specialized domains. We also release MiroThinker-1.7 and MiroThinker-1.7-mini as open-source models, providing competitive research-agent capabilities with significantly improved efficiency.

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Abstract

Paper Structure (55 sections, 13 equations, 7 figures, 4 tables)

This paper contains 55 sections, 13 equations, 7 figures, 4 tables.

Introduction
Related Works
Agentic Large Language Models
Deep Research Agents
Agentic Workflow
Formulation
Step Loop
Episode Loop
Tools
Information Retrieval
Code Execution
File and Data Transfer
Implementation Details
Sliding-Window Filtering
Result Truncation
...and 40 more sections

Figures (7)

Figure 1: Comparison of MiroThinker with state-of-the-art agents and agentic foundation models.
Figure 2: The overview of MiroThinker-1.7 & H1.
Figure 3: Overview of the dual-pipeline QA synthesis framework. The Corpus-based Pipeline (left) focuses on topical breadth and high-throughput generation from document subgraphs. The WebHop Pipeline (right) constructs calibrated reasoning trees with web-augmented expansion and hierarchical verification to ensure reasoning rigour and controllable complexity.
Figure 4: The agentic training pipeline of MiroThinker-1.7.
Figure 5: Training dynamics of MiroThinker-1.7-mini for GRPO Agentic RL. BrowseComp-200 is our selected challenging subset from BrowseComp for faster evaluation during training. The plotted curves represent a running average with a window size of 5 to highlight the optimization trends.
...and 2 more figures

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Abstract

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Authors

Abstract

Table of Contents

Figures (7)