Table of Contents
Fetching ...

MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming

Chengqi Zheng, Jianda Chen, Yueming Lyu, Wen Zheng Terence Ng, Haopeng Zhang, Yew-Soon Ong, Ivor Tsang, Haiyan Yin

TL;DR

MermaidFlow tackles brittleness in agentic reasoning by introducing a declarative graph representation based on Mermaid and a safety-conscious evolutionary search over that graph space.By separating planning from execution and enforcing static type and connectivity constraints, the approach yields verifiable, executable workflows with improved search efficiency.Empirical results across GSM8K, MATH, HumanEval, and MBPP show MermaidFlow achieving top performance and higher validity than code-based or prompt-only baselines, with a notable average improvement of 1.40 percentage points over the best prior baseline.The work highlights the significance of structure-aware, compiler-verified workflow design for scalable, interpretable, and robust multi-agent reasoning systems.

Abstract

Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using Mermaid, a structured and human-interpretable graph language. We formulate domain-aware evolutionary operators, i.e., crossover, mutation, insertion, and deletion, to preserve semantic correctness while promoting structural diversity, enabling efficient exploration of a high-quality, statically verifiable workflow space. Without modifying task settings or evaluation protocols, MermaidFlow achieves consistent improvements in success rates and faster convergence to executable plans on the agent reasoning benchmark. The experimental results demonstrate that safety-constrained graph evolution offers a scalable, modular foundation for robust and interpretable agentic reasoning systems.

MermaidFlow: Redefining Agentic Workflow Generation via Safety-Constrained Evolutionary Programming

TL;DR

MermaidFlow tackles brittleness in agentic reasoning by introducing a declarative graph representation based on Mermaid and a safety-conscious evolutionary search over that graph space.By separating planning from execution and enforcing static type and connectivity constraints, the approach yields verifiable, executable workflows with improved search efficiency.Empirical results across GSM8K, MATH, HumanEval, and MBPP show MermaidFlow achieving top performance and higher validity than code-based or prompt-only baselines, with a notable average improvement of 1.40 percentage points over the best prior baseline.The work highlights the significance of structure-aware, compiler-verified workflow design for scalable, interpretable, and robust multi-agent reasoning systems.

Abstract

Despite the promise of autonomous agentic reasoning, existing workflow generation methods frequently produce fragile, unexecutable plans due to unconstrained LLM-driven construction. We introduce MermaidFlow, a framework that redefines the agentic search space through safety-constrained graph evolution. At its core, MermaidFlow represent workflows as a verifiable intermediate representation using Mermaid, a structured and human-interpretable graph language. We formulate domain-aware evolutionary operators, i.e., crossover, mutation, insertion, and deletion, to preserve semantic correctness while promoting structural diversity, enabling efficient exploration of a high-quality, statically verifiable workflow space. Without modifying task settings or evaluation protocols, MermaidFlow achieves consistent improvements in success rates and faster convergence to executable plans on the agent reasoning benchmark. The experimental results demonstrate that safety-constrained graph evolution offers a scalable, modular foundation for robust and interpretable agentic reasoning systems.

Paper Structure

This paper contains 35 sections, 1 theorem, 7 equations, 8 figures, 3 tables, 3 algorithms.

Key Result

Lemma 1

Let $\mathcal{S}$ denote the declarative workflow space defined in Section 3.2. For any workflow graph $G \in \mathcal{S}$ and any atomic transformation operator $\mathcal{O}$ defined above, the resulting graph $G' = \mathcal{O}(G)$ also belongs to $\mathcal{S}$: where $\mathbb{O}$ is the set of constraint-preserving operators over MermaidFlow graphs. That is, $\mathcal{S}$ is closed under all va

Figures (8)

  • Figure 1: An illustration of the workflow lifecycle in MermaidFlow. The workflow is modeled as a declarative graph using Mermaid code, where nodes $\mathcal{V}_{[\tau, \alpha]}$ and edges $\mathcal{E}_{[\rho]}$ are explicitly defined with annotated prompts and roles (lines 3-8), styled and typed (lines 11–21), and connected via directed edges (lines 24–30). This results in a statically verifiable, semantically typed, and structurally interpretable representation that serves as a unified interface for visualization, validation, and code generation.
  • Figure 2: Overview of the MermaidFlow framework. Left: Comparison between imperative (Python-based) and declarative (Mermaid-based) workflow representations. MermaidFlow models workflows as statically typed, verifiable graphs, enabling interpretable planning and structure-aware code generation. Right: Illustration of the safety-aware evolutionary programming process. Given historical Mermaid workflows, the EP sampler selects parent candidates and applies EP operators. Resulting workflows are evaluated by the LLM-as-Judge to update the workflow population.
  • Figure 3: An illustrative figure comparing the highest solve rates on the MATH dataset between MermaidFlow and AFlow on the training set (119 problems) and test set (486 problems) across optimization iterations.
  • Figure 4: A case study on the HumanEval dataset showcasing how MermaidFlow evolves structured agentic workflows through evolutionary programming (with a detailed example of the crossover operator). The declarative graph representation also enables reliable translation of workflow graphs into executable python code (zoom-in view recommended).
  • Figure 5: Mermaid diagram for GSM8K.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Lemma 1: MermaidFlow Transformation Invariance
  • Definition 1