Table of Contents
Fetching ...

FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering

Xiaochen Wang, Junqing He, Zhe yang, Yiru Wang, Xiangdi Meng, Kunhao Pan, Zhifang Sui

TL;DR

Multi-hop QA with LLMs is hampered by hallucination, error propagation, and limited context. The authors introduce FSM, a zero-shot prompting paradigm that decomposes questions into sub-questions and uses automaton-like state transitions with self-correction, implemented in two stages. Experiments on HotpotQA, 2Wiki, and especially Musique show FSM outperforming baselines and reducing formatting errors, with notable gains on the hardest Musique dataset and better alignment with output formats. The approach promises greater trustworthiness and broader applicability, including NL2SQL, by improving intermediate reasoning control without extensive demonstrations.

Abstract

Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks in addition to improved effectiveness and trustworthiness. Different from COT methods, FSM addresses MHQA by iteratively decomposing a question into multi-turn sub-questions, and self-correcting in time, improving the accuracy of answers in each step. Specifically, FSM addresses one sub-question at a time and decides on the next step based on its current result and state, in an automaton-like format. Experiments on benchmarks show the effectiveness of our method. Although our method performs on par with the baseline on relatively simpler datasets, it excels on challenging datasets like Musique. Moreover, this approach mitigates the hallucination phenomenon, wherein the correct final answer can be recovered despite errors in intermediate reasoning. Furthermore, our method improves LLMs' ability to follow specified output format requirements, significantly reducing the difficulty of answer interpretation and the need for reformatting.

FSM: A Finite State Machine Based Zero-Shot Prompting Paradigm for Multi-Hop Question Answering

TL;DR

Multi-hop QA with LLMs is hampered by hallucination, error propagation, and limited context. The authors introduce FSM, a zero-shot prompting paradigm that decomposes questions into sub-questions and uses automaton-like state transitions with self-correction, implemented in two stages. Experiments on HotpotQA, 2Wiki, and especially Musique show FSM outperforming baselines and reducing formatting errors, with notable gains on the hardest Musique dataset and better alignment with output formats. The approach promises greater trustworthiness and broader applicability, including NL2SQL, by improving intermediate reasoning control without extensive demonstrations.

Abstract

Large Language Models (LLMs) with chain-of-thought (COT) prompting have demonstrated impressive abilities on simple nature language inference tasks. However, they tend to perform poorly on Multi-hop Question Answering (MHQA) tasks due to several challenges, including hallucination, error propagation and limited context length. We propose a prompting method, Finite State Machine (FSM) to enhance the reasoning capabilities of LLM for complex tasks in addition to improved effectiveness and trustworthiness. Different from COT methods, FSM addresses MHQA by iteratively decomposing a question into multi-turn sub-questions, and self-correcting in time, improving the accuracy of answers in each step. Specifically, FSM addresses one sub-question at a time and decides on the next step based on its current result and state, in an automaton-like format. Experiments on benchmarks show the effectiveness of our method. Although our method performs on par with the baseline on relatively simpler datasets, it excels on challenging datasets like Musique. Moreover, this approach mitigates the hallucination phenomenon, wherein the correct final answer can be recovered despite errors in intermediate reasoning. Furthermore, our method improves LLMs' ability to follow specified output format requirements, significantly reducing the difficulty of answer interpretation and the need for reformatting.
Paper Structure (18 sections, 5 figures, 2 tables)

This paper contains 18 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The abstract flow chart of FSM
  • Figure 2: The flow chart of proposed FSM and a simple case in detail. A multi-hop question is sovled step by step orderly. The book icon indicates candidate paragraphs in the search step. The robot denotes LLMs.
  • Figure 3: The outputs of FSM are standard json format.
  • Figure 4: There are some error format examples for COT.
  • Figure 5: Contrast between baseline and FSM. There are some error examples for baseline.