Table of Contents
Fetching ...

LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations

Shashank Kirtania, Priyanshu Gupta, Arjun Radhakirshna

TL;DR

The paper addresses the semantic weaknesses of LLM-based symbolic reasoning by introducing Logic-LM++, which augments the Logic-LM framework with pairwise comparison-based semantic checks and richer refinement context. It adds a Self-Refinement Agent to focus refinements on the problem statement and a Backtracking Agent to prune non-improving edits, aiming for semantically correct symbolic formulations. Evaluations on FOLIO, AR-LSAT, and ProofWriter show substantial improvements over baselines, including notable gains in execution accuracy and consistency across prompting regimes. The work demonstrates the potential to generalize semantic refinement to tool-augmented reasoning, while acknowledging limitations with initial formulations and smaller LLMs affecting semantic capture.

Abstract

In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. Although recent works have started to employ formal languages as an intermediate representation for reasoning tasks, they often face challenges in accurately generating and refining these formal specifications to ensure correctness. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM . It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and other contemporary techniques across natural language reasoning tasks on three datasets, FOLIO, ProofWriter and AR-LSAT, with an average improvement of 18.5% on standard prompting, 12.3% on chain of thought prompting and 5% on Logic-LM.

LOGIC-LM++: Multi-Step Refinement for Symbolic Formulations

TL;DR

The paper addresses the semantic weaknesses of LLM-based symbolic reasoning by introducing Logic-LM++, which augments the Logic-LM framework with pairwise comparison-based semantic checks and richer refinement context. It adds a Self-Refinement Agent to focus refinements on the problem statement and a Backtracking Agent to prune non-improving edits, aiming for semantically correct symbolic formulations. Evaluations on FOLIO, AR-LSAT, and ProofWriter show substantial improvements over baselines, including notable gains in execution accuracy and consistency across prompting regimes. The work demonstrates the potential to generalize semantic refinement to tool-augmented reasoning, while acknowledging limitations with initial formulations and smaller LLMs affecting semantic capture.

Abstract

In this paper we examine the limitations of Large Language Models (LLMs) for complex reasoning tasks. Although recent works have started to employ formal languages as an intermediate representation for reasoning tasks, they often face challenges in accurately generating and refining these formal specifications to ensure correctness. To address these issues, this paper proposes Logic-LM++, an improvement on Logic-LM . It uses the ability of LLMs to do pairwise comparisons, allowing the evaluation of the refinements suggested by the LLM. The paper demonstrates that Logic-LM++ outperforms Logic-LM and other contemporary techniques across natural language reasoning tasks on three datasets, FOLIO, ProofWriter and AR-LSAT, with an average improvement of 18.5% on standard prompting, 12.3% on chain of thought prompting and 5% on Logic-LM.
Paper Structure (14 sections, 4 figures, 2 tables)

This paper contains 14 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Refinement of logical formulations in Logic-LM
  • Figure 2: Improvement in refinement by Logic-LM++
  • Figure 3: Accuracy in subsequent rounds of refinement. The grey line here represents the accuracy scores on self-refinement without backtracking with GPT-4.
  • Figure 4: Number of symbolic formulations corrected after each turn of self-refinement with backtracking agent (purple) and without backtracking agent (green) in FOLIO with GPT-4.