Table of Contents
Fetching ...

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

Ziyi Tang, Ruilin Wang, Weixing Chen, Yongsen Zheng, Zechuan Chen, Yang Liu, Keze Wang, Tianshui Chen, Liang Lin

TL;DR

This work introduces CaCo-CoT, a multi-agent framework that uses faithful reasoners and causal evaluators to promote causal consistency in knowledge-based reasoning with foundation models. Through a reasoning-and-consensus architecture, it explicitly handles factual and inferential errors via non-causal and counterfactual evaluations, triggering iterative re-reasoning when needed. Empirical results across science QA, commonsense, and multi-modal benchmarks show state-of-the-art performance and robust reasoning fidelity, with ablations highlighting the evaluator's crucial role. The approach demonstrates strong applicability across text and multi-modal domains, while also noting limitations such as shared biases and potential error accumulation among agents.

Abstract

Despite the progress of foundation models, knowledge-based reasoning remains a persistent challenge due to their limited capacity for knowledge recall and inference. Existing methods primarily focus on encouraging these models to plan and solve problems or extensively sample reasoning chains independently. However, these methods often overlook conceptual errors and inferential fallacies, inevitably leading to a series of notorious issues such as misleading conclusions, cognitive biases, and reduced decision quality. While explicit modeling of causality is argued to hold promise in addressing these issues, contemporary research efforts have thus far fallen short in achieving causality-based foundation models. Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators. These agents collaboratively work within a reasoning-and-consensus paradigm to improve faithfulness. The reasoners are tasked with generating reasoning chains for knowledge-intensive problems by mimicking human causal reasoning. Meanwhile, the evaluator scrutinizes the causal consistency of a reasoner's reasoning chain from a non-causal and a counterfactual perspective. Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations across text-based and multi-modal knowledge reasoning tasks (e.g., science question answering and commonsense reasoning).

Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs

TL;DR

This work introduces CaCo-CoT, a multi-agent framework that uses faithful reasoners and causal evaluators to promote causal consistency in knowledge-based reasoning with foundation models. Through a reasoning-and-consensus architecture, it explicitly handles factual and inferential errors via non-causal and counterfactual evaluations, triggering iterative re-reasoning when needed. Empirical results across science QA, commonsense, and multi-modal benchmarks show state-of-the-art performance and robust reasoning fidelity, with ablations highlighting the evaluator's crucial role. The approach demonstrates strong applicability across text and multi-modal domains, while also noting limitations such as shared biases and potential error accumulation among agents.

Abstract

Despite the progress of foundation models, knowledge-based reasoning remains a persistent challenge due to their limited capacity for knowledge recall and inference. Existing methods primarily focus on encouraging these models to plan and solve problems or extensively sample reasoning chains independently. However, these methods often overlook conceptual errors and inferential fallacies, inevitably leading to a series of notorious issues such as misleading conclusions, cognitive biases, and reduced decision quality. While explicit modeling of causality is argued to hold promise in addressing these issues, contemporary research efforts have thus far fallen short in achieving causality-based foundation models. Drawing inspiration from the orchestration of diverse specialized agents collaborating to tackle intricate tasks, we propose a framework named Causal-Consistency Chain-of-Thought (CaCo-CoT) that harnesses multi-agent collaboration to bolster the faithfulness and causality of foundation models, involving a set of reasoners and evaluators. These agents collaboratively work within a reasoning-and-consensus paradigm to improve faithfulness. The reasoners are tasked with generating reasoning chains for knowledge-intensive problems by mimicking human causal reasoning. Meanwhile, the evaluator scrutinizes the causal consistency of a reasoner's reasoning chain from a non-causal and a counterfactual perspective. Our framework demonstrates significant superiority over state-of-the-art methods through extensive and comprehensive evaluations across text-based and multi-modal knowledge reasoning tasks (e.g., science question answering and commonsense reasoning).
Paper Structure (24 sections, 9 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 9 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: The overview of our CaCo-CoT. Given a question from ScienceQA, reasoners fail to capture the golden answer (C) in the first cooperation round. For this issue, multi-round cooperation between multiple reasoners and evaluators is adopted to yield a faithful answer.
  • Figure 2: Comparison between existing approaches and causal-consistency chain-of-thought (CaCo-CoT). CaCo-CoT introduces a collaborative mechanism where faithful reasoner agents (upper left) and causal evaluator agents (right) cooperate to produce a causally consistent reasoning chain, aiming to minimize factual and inferential errors.
  • Figure 3: Demonstration of how a faithful reasoner and a causal evaluator analyze a molecular geometry question.
  • Figure 4: Performance Improvement of CaCo-CoT over Native CoT on the MME commonsense reasoning split. Following the standard setting fu2023mme, ACC (%) calculates the accuracy of each image-question pair. ACC+ measures if two questions associated with an image are both answered correctly.
  • Figure 5: Performance comparison on the MMMU dataset with Llama3-LLaVA-Next-8B and GPT-4o-mini.
  • ...and 4 more figures