Flowchart2Mermaid: A Vision-Language Model Powered System for Converting Flowcharts into Editable Diagram Code
Pritam Deka, Barry Devereux
TL;DR
Flowchart2Mermaid tackles the editability gap in flowcharts by converting images to Mermaid.js code via a vision-language model and enabling mixed-initiative refinement through inline, visual, and natural-language interactions. The system is designed as a lightweight, model-agnostic web app with a bidirectional code–diagram synchronization that supports real-time rendering, exports, and seamless integration with the Mermaid Live Editor. Key contributions include a unified editing interface, a structured prompting strategy for robust code generation, and a comprehensive evaluation using FlowVQA with both symbolic and high-level structural metrics. Empirical results show strong symbolic extraction and near-coherent global structure for large models, underscoring the method’s potential for producing editable, reproducible, and shareable flowcharts.
Abstract
Flowcharts are common tools for communicating processes but are often shared as static images that cannot be easily edited or reused. We present Flowchart2Mermaid, a lightweight web system that converts flowchart images into editable Mermaid.js code which is a markup language for visual workflows, using a detailed system prompt and vision-language models. The interface supports mixed-initiative refinement through inline text editing, drag-and-drop node insertion, and natural-language commands interpreted by an integrated AI assistant. Unlike prior image-to-diagram tools, our approach produces a structured, version-controllable textual representation that remains synchronized with the rendered diagram. We further introduce evaluation metrics to assess structural accuracy, flow correctness, syntax validity, and completeness across multiple models.
