Table of Contents
Fetching ...

FlowMind: Automatic Workflow Generation with LLMs

Zhen Zeng, William Watson, Nicole Cho, Saba Rahimi, Shayleen Reynolds, Tucker Balch, Manuela Veloso

TL;DR

FlowMind addresses the challenge of spontaneous-task automation by generating workflows on the fly with LLMs while grounding reasoning in reliable APIs to prevent hallucinations and protect data. It introduces a generic lecture prompt recipe that educates the LLM about context and API interfaces, enabling safe and effective code generation for workflows. The approach is evaluated on a finance-focused NCEN-QA benchmark built from N-CEN reports, with ablations demonstrating the contribution of each lecture-prompt component and a user-feedback loop that substantially boosts performance toward near-perfect accuracy. The work advances practical, privacy-preserving LLM-driven automation for finance and similar domains, and provides a benchmark and methodology for future research in API-grounded workflow generation.

Abstract

The rapidly evolving field of Robotic Process Automation (RPA) has made significant strides in automating repetitive processes, yet its effectiveness diminishes in scenarios requiring spontaneous or unpredictable tasks demanded by users. This paper introduces a novel approach, FlowMind, leveraging the capabilities of Large Language Models (LLMs) such as Generative Pretrained Transformer (GPT), to address this limitation and create an automatic workflow generation system. In FlowMind, we propose a generic prompt recipe for a lecture that helps ground LLM reasoning with reliable Application Programming Interfaces (APIs). With this, FlowMind not only mitigates the common issue of hallucinations in LLMs, but also eliminates direct interaction between LLMs and proprietary data or code, thus ensuring the integrity and confidentiality of information - a cornerstone in financial services. FlowMind further simplifies user interaction by presenting high-level descriptions of auto-generated workflows, enabling users to inspect and provide feedback effectively. We also introduce NCEN-QA, a new dataset in finance for benchmarking question-answering tasks from N-CEN reports on funds. We used NCEN-QA to evaluate the performance of workflows generated by FlowMind against baseline and ablation variants of FlowMind. We demonstrate the success of FlowMind, the importance of each component in the proposed lecture recipe, and the effectiveness of user interaction and feedback in FlowMind.

FlowMind: Automatic Workflow Generation with LLMs

TL;DR

FlowMind addresses the challenge of spontaneous-task automation by generating workflows on the fly with LLMs while grounding reasoning in reliable APIs to prevent hallucinations and protect data. It introduces a generic lecture prompt recipe that educates the LLM about context and API interfaces, enabling safe and effective code generation for workflows. The approach is evaluated on a finance-focused NCEN-QA benchmark built from N-CEN reports, with ablations demonstrating the contribution of each lecture-prompt component and a user-feedback loop that substantially boosts performance toward near-perfect accuracy. The work advances practical, privacy-preserving LLM-driven automation for finance and similar domains, and provides a benchmark and methodology for future research in API-grounded workflow generation.

Abstract

The rapidly evolving field of Robotic Process Automation (RPA) has made significant strides in automating repetitive processes, yet its effectiveness diminishes in scenarios requiring spontaneous or unpredictable tasks demanded by users. This paper introduces a novel approach, FlowMind, leveraging the capabilities of Large Language Models (LLMs) such as Generative Pretrained Transformer (GPT), to address this limitation and create an automatic workflow generation system. In FlowMind, we propose a generic prompt recipe for a lecture that helps ground LLM reasoning with reliable Application Programming Interfaces (APIs). With this, FlowMind not only mitigates the common issue of hallucinations in LLMs, but also eliminates direct interaction between LLMs and proprietary data or code, thus ensuring the integrity and confidentiality of information - a cornerstone in financial services. FlowMind further simplifies user interaction by presenting high-level descriptions of auto-generated workflows, enabling users to inspect and provide feedback effectively. We also introduce NCEN-QA, a new dataset in finance for benchmarking question-answering tasks from N-CEN reports on funds. We used NCEN-QA to evaluate the performance of workflows generated by FlowMind against baseline and ablation variants of FlowMind. We demonstrate the success of FlowMind, the importance of each component in the proposed lecture recipe, and the effectiveness of user interaction and feedback in FlowMind.
Paper Structure (24 sections, 7 figures, 1 table)

This paper contains 24 sections, 7 figures, 1 table.

Figures (7)

  • Figure 1: FlowMind automates spontaneous tasks demanded by users through on-the-fly workflow generation, advancing beyond traditional automation of repetitive tasks designed by domain experts.
  • Figure 2: Overview of FlowMind framework. (a) Stage 1: we follow the proposed generic lecture recipe to generate a lecture prompt, which educates the LLM about the context, APIs, and get ready to write code; (b) Stage 2: LLM can then take user queries or tasks and auto-generate the workflow code that makes use of the introduced APIs. The workflow code is executed to deliver the result. During stage 2, we enable a feedback loop between FlowMind and the user, where FlowMind provides high-level description of the generated workflow in plain-language, and the user inputs feedback to FlowMind to approve or refine the workflow if needed.
  • Figure 3: Before an LLM takes any queries or tasks from users, we first give a lecture to it. We show an example of such a lecture above. The proposed generic lecture recipe includes: 1) setting up the context, 2) enumerating the available APIs with each function declaration, parameters, and high-level descriptions, and 3) prompting the LLM to write workflow code using these APIs.
  • Figure 4: NCEN-QA-Easy: example questions, corresponding workflow and result generated by FlowMind.
  • Figure 5: NCEN-QA-Intermediate: example questions, corresponding workflows and results generated by FlowMind.
  • ...and 2 more figures