Table of Contents
Fetching ...

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks

Shiqian Su, Sen Xing, Xuan Dong, Muyan Zhong, Bin Wang, Xizhou Zhu, Yuntao Chen, Wenhai Wang, Yue Deng, Pengxiang Zhu, Ziyuan Liu, Tiantong Li, Jiaheng Yu, Zhe Chen, Lidong Bing, Jifeng Dai

TL;DR

This work proposes a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow execution to ensure stable and reproducible performance.

Abstract

Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs have begun to plateau when tackling real-world, complex tasks that require interaction with external tools and dynamic environments. Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance, limited support across diverse benchmarks and tasks, and heavy reliance on costly commercial APIs. In this work, we propose a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow execution to ensure stable and reproducible performance. Extensive experiments demonstrate that MiroFlow consistently achieves state-of-the-art performance across multiple agent benchmarks, including GAIA, BrowseComp-EN/ZH, HLE, xBench-DeepSearch, and notably FutureX. We hope it could serve as an easily accessible, reproducible, and comparable baseline for the deep research community.

MiroFlow: Towards High-Performance and Robust Open-Source Agent Framework for General Deep Research Tasks

TL;DR

This work proposes a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow execution to ensure stable and reproducible performance.

Abstract

Despite the remarkable progress of large language models (LLMs), the capabilities of standalone LLMs have begun to plateau when tackling real-world, complex tasks that require interaction with external tools and dynamic environments. Although recent agent frameworks aim to enhance model autonomy through tool integration and external interaction, they still suffer from naive workflows, unstable performance, limited support across diverse benchmarks and tasks, and heavy reliance on costly commercial APIs. In this work, we propose a high-performance and robust open-source agent framework, termed MiroFlow, which incorporates an agent graph for flexible orchestration, an optional deep reasoning mode to enhance performance, and a robust workflow execution to ensure stable and reproducible performance. Extensive experiments demonstrate that MiroFlow consistently achieves state-of-the-art performance across multiple agent benchmarks, including GAIA, BrowseComp-EN/ZH, HLE, xBench-DeepSearch, and notably FutureX. We hope it could serve as an easily accessible, reproducible, and comparable baseline for the deep research community.
Paper Structure (22 sections, 18 figures, 9 tables)

This paper contains 22 sections, 18 figures, 9 tables.

Figures (18)

  • Figure 1: Overall performance of MiroFlow on representative deep research benchmarks. MiroFlow, a high-performance and robust open-source agent framework, achieves reproducible state-of-the-art results across all benchmarks, consistently outperforming existing open-source and commercial agent systems. All MiroFlow results are obtained with a single unified configuration without any task-specific tuning, demonstrating strong generality and adaptability across heterogeneous deep research tasks.
  • Figure 2: Overview of the three-tier hierarchical MiroFlow framework architecture.Foundation tier provides reusable core components, including LLM backends, MCP-based tool sets, and generic input–output processors, which supply the basic capabilities required by all agents. Agent tier defines a set of agent nodes constructed by combining foundation-tier components with specific prompts. Each node is provided with a list of agents it can call. These nodes communicate through structured messages and can be flexibly instantiated or extended. Control tier assembles multiple agent nodes into an agent graph and orchestrates the end-to-end workflow: user queries enter the graph and are processed through coordinated agent interactions and tool calls, while the controller maintains task logs and checkpoints for reproducibility, supports a heavy-reasoning mode to improve accuracy, and incorporates workflow-level robustness enhancements to ensure smooth and predictable execution.
  • Figure 3: Illustration of Heavy-Reasoning Mode.
  • Figure 4: Accuracy vs. Max Turns on GAIA validation. Accuracy improves as max turn increases and then saturates. More difficult problems require more turns. Multi-agent settings saturate earlier but perform slightly worse than the single-agent setting.
  • Figure 5: Example of Instability by Instruction Adherence. The agent fails to follow the spatial constraint regarding the back of the house, causing a logical inversion where it selects West-facing fronts instead of West-facing backs for the sunset design.
  • ...and 13 more figures