Table of Contents
Fetching ...

Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying

Federico Castagna, Isabel Sassoon, Simon Parsons

TL;DR

This paper addresses persistent reasoning gaps in state-of-the-art LLMs for logic and math by introducing Critical-Questions-of-Thought (CQoT), a four-step, Toulmin-informed pipeline that probes and corrects reasoning via eight critical questions before producing final answers. It combines argumentation theory with test-time compute to guide LLMs through a reasoning plan, iteratively validating premises and warrants. Empirical evaluation on MT-Bench Reasoning and Math across multiple models shows CQoT achieving about a 5% average improvement over baselines and standard Chain-of-Thought, with open-source models occasionally outperforming proprietary ones. The work demonstrates a practical, model-agnostic approach that enhances reasoning under incomplete information and supports open science by using freely available LLMs and sharing the pipeline publicly.

Abstract

Studies have underscored how, regardless of the recent breakthrough and swift advances in AI research, even state-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning. The results seem to suggest that LLMs still work as (highly advanced) data pattern identifiers, scoring poorly when attempting to generalise and solve reasoning problems the models have never previously seen or that are not close to samples presented in their training data. To address this compelling concern, this paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation. We show that employing these critical questions can improve the reasoning capabilities of LLMs. By probing the rationale behind the models' reasoning process, the LLM can assess whether some logical mistake is occurring and correct it before providing the final reply to the user prompt. The underlying idea is drawn from the gold standard of any valid argumentative procedure: the conclusion is valid if it is entailed by accepted premises. Or, to paraphrase such Aristotelian principle in a real-world approximation, characterised by incomplete information and presumptive logic, the conclusion is valid if not proved otherwise. This approach successfully steers the models' output through a reasoning pipeline, resulting in better performance against the baseline and its Chain-of-Thought (CoT) implementation. To this end, an extensive evaluation of the proposed approach on the MT-Bench Reasoning and Math tasks across a range of LLMs is provided.

Critical-Questions-of-Thought: Steering LLM reasoning with Argumentative Querying

TL;DR

This paper addresses persistent reasoning gaps in state-of-the-art LLMs for logic and math by introducing Critical-Questions-of-Thought (CQoT), a four-step, Toulmin-informed pipeline that probes and corrects reasoning via eight critical questions before producing final answers. It combines argumentation theory with test-time compute to guide LLMs through a reasoning plan, iteratively validating premises and warrants. Empirical evaluation on MT-Bench Reasoning and Math across multiple models shows CQoT achieving about a 5% average improvement over baselines and standard Chain-of-Thought, with open-source models occasionally outperforming proprietary ones. The work demonstrates a practical, model-agnostic approach that enhances reasoning under incomplete information and supports open science by using freely available LLMs and sharing the pipeline publicly.

Abstract

Studies have underscored how, regardless of the recent breakthrough and swift advances in AI research, even state-of-the-art Large Language models (LLMs) continue to struggle when performing logical and mathematical reasoning. The results seem to suggest that LLMs still work as (highly advanced) data pattern identifiers, scoring poorly when attempting to generalise and solve reasoning problems the models have never previously seen or that are not close to samples presented in their training data. To address this compelling concern, this paper makes use of the notion of critical questions from the literature on argumentation theory, focusing in particular on Toulmin's model of argumentation. We show that employing these critical questions can improve the reasoning capabilities of LLMs. By probing the rationale behind the models' reasoning process, the LLM can assess whether some logical mistake is occurring and correct it before providing the final reply to the user prompt. The underlying idea is drawn from the gold standard of any valid argumentative procedure: the conclusion is valid if it is entailed by accepted premises. Or, to paraphrase such Aristotelian principle in a real-world approximation, characterised by incomplete information and presumptive logic, the conclusion is valid if not proved otherwise. This approach successfully steers the models' output through a reasoning pipeline, resulting in better performance against the baseline and its Chain-of-Thought (CoT) implementation. To this end, an extensive evaluation of the proposed approach on the MT-Bench Reasoning and Math tasks across a range of LLMs is provided.

Paper Structure

This paper contains 22 sections, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Toulmin's schema: the case of Harry's nationality.
  • Figure 2: The four-step process of the CQoT pipeline.
  • Figure 3: Comparison between the responses given by the baseline Llama 3.1 70b-Instruct (wrong, red coloured) and its CQoT counterpart (correct, green coloured). Notice that the multiple 'Step' mentioned in the latter reply do not refer to the CQoT pipeline: it is just how the output has been phrased by the model.
  • Figure 4: Two-step pipeline for the ablation study.
  • Figure 5: Comparison between performance achieved by the baseline model with (CQoT) and without (Standard) the Critical-Questions-of-Thought approach.
  • ...and 6 more figures