FloCA: Towards Faithful and Logically Consistent Flowchart Reasoning
Jinzi Zou, Bolin Wang, Liang Li, Shuo Zhang, Nuo Xu, Junzhou Zhao
TL;DR
This work formalizes Flowchart-Oriented Dialogue (FOD), where multi-turn conversations must progress along a flowchart topology with faithful node-grounding and valid transitions. It introduces FloCA, a zero-shot agent that delegates topology-aware reasoning to a dedicated flowchart reasoning tool while handling intent understanding and response generation with an instruction-following LLM. A novel interactive evaluation framework pairs FloCA with an LLM-based user simulator and five metrics to assess reasoning accuracy and interaction efficiency. Empirical results on FLODIAL and PFDial show FloCA achieving state-of-the-art task success and stronger logical consistency than baselines that rely on RAG, VLMs, or fine-tuning, demonstrating the value of explicit topology-constrained graph execution. The work also provides a practical framework and codebase to advance faithful, flowchart-guided reasoning in real-world decision-support and procedural tasks.
Abstract
Flowchart-oriented dialogue (FOD) systems aim to guide users through multi-turn decision-making or operational procedures by following a domain-specific flowchart to achieve a task goal. In this work, we formalize flowchart reasoning in FOD as grounding user input to flowchart nodes at each dialogue turn while ensuring node transition is consistent with the correct flowchart path. Despite recent advances of LLMs in task-oriented dialogue systems, adapting them to FOD still faces two limitations: (1) LLMs lack an explicit mechanism to represent and reason over flowchart topology, and (2) they are prone to hallucinations, leading to unfaithful flowchart reasoning. To address these limitations, we propose FloCA, a zero-shot flowchart-oriented conversational agent. FloCA uses an LLM for intent understanding and response generation while delegating flowchart reasoning to an external tool that performs topology-constrained graph execution, ensuring faithful and logically consistent node transitions across dialogue turns. We further introduce an evaluation framework with an LLM-based user simulator and five new metrics covering reasoning accuracy and interaction efficiency. Extensive experiments on FLODIAL and PFDial datasets highlight the bottlenecks of existing LLM-based methods and demonstrate the superiority of FloCA. Our codes are available at https://github.com/Jinzi-Zou/FloCA-flowchart-reasoning.
