Table of Contents
Fetching ...

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

Litao Guo, Xinli Xu, Luozhou Wang, Jiantao Lin, Jinsong Zhou, Zixin Zhang, Bolan Su, Ying-Cong Chen

TL;DR

ComfyMind tackles instability in open-source general-purpose visual generation by introducing a semantic workflow planning framework and a hierarchical search-tree planner with localized feedback. Built on the ComfyUI platform, it abstracts low-level node graphs into semantic modules and enables robust, adaptive task composition and correction. Across ComfyBench, GenEval, and Reason-Edit, it achieves strong results—$100\%$ pass-rate on ComfyBench, $0.90$ GenEval score, and $0.906$ GPT-score on Reason-Edit—comparable to GPT-Image-1 while remaining open-source. The approach offers a scalable foundation for open, general-purpose generative AI systems by leveraging semantic abstractions and localized feedback to manage complex multi-stage workflows.

Abstract

With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems. Project page: https://github.com/LitaoGuo/ComfyMind

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback

TL;DR

ComfyMind tackles instability in open-source general-purpose visual generation by introducing a semantic workflow planning framework and a hierarchical search-tree planner with localized feedback. Built on the ComfyUI platform, it abstracts low-level node graphs into semantic modules and enables robust, adaptive task composition and correction. Across ComfyBench, GenEval, and Reason-Edit, it achieves strong results— pass-rate on ComfyBench, GenEval score, and GPT-score on Reason-Edit—comparable to GPT-Image-1 while remaining open-source. The approach offers a scalable foundation for open, general-purpose generative AI systems by leveraging semantic abstractions and localized feedback to manage complex multi-stage workflows.

Abstract

With the rapid advancement of generative models, general-purpose generation has gained increasing attention as a promising approach to unify diverse tasks across modalities within a single system. Despite this progress, existing open-source frameworks often remain fragile and struggle to support complex real-world applications due to the lack of structured workflow planning and execution-level feedback. To address these limitations, we present ComfyMind, a collaborative AI system designed to enable robust and scalable general-purpose generation, built on the ComfyUI platform. ComfyMind introduces two core innovations: Semantic Workflow Interface (SWI) that abstracts low-level node graphs into callable functional modules described in natural language, enabling high-level composition and reducing structural errors; Search Tree Planning mechanism with localized feedback execution, which models generation as a hierarchical decision process and allows adaptive correction at each stage. Together, these components improve the stability and flexibility of complex generative workflows. We evaluate ComfyMind on three public benchmarks: ComfyBench, GenEval, and Reason-Edit, which span generation, editing, and reasoning tasks. Results show that ComfyMind consistently outperforms existing open-source baselines and achieves performance comparable to GPT-Image-1. ComfyMind paves a promising path for the development of open-source general-purpose generative AI systems. Project page: https://github.com/LitaoGuo/ComfyMind

Paper Structure

This paper contains 32 sections, 19 figures, 5 tables.

Figures (19)

  • Figure 1: Overview of generative and editing capabilities supported by ComfyMind.
  • Figure 2: Structural comparison between ours and ComfyAgent.
  • Figure 3: Overview of ComfyMind pipeline. Given a user instruction, the system first parses the task and delegates it to Planning Agent. The Agent incrementally explores a semantic search tree, where each node proposes a candidate workflow and receives local feedback based on execution results.
  • Figure 4: Qualitative comparison on challenging GenEval ghosh2023geneval cases. Under constraints such as counting, color, position and attribute binding, only our method successfully satisfies all instructions, clearly outperforming SD3, Janus-Pro, and GPT-Image-1.
  • Figure 5: Quantitative Comparison on Reason-Edit huang2024smartedit benchmark.
  • ...and 14 more figures