ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Zhongsheng Wang; Jiamou Liu; Qiming Bao; Hongfei Rong; Jingfeng Zhang

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Zhongsheng Wang, Jiamou Liu, Qiming Bao, Hongfei Rong, Jingfeng Zhang

TL;DR

ChatLogic tackles the challenge of robust multi-step reasoning in LLMs by integrating a symbolic logic engine via pyDatalog, converting natural language problems into logic programs, and executing them locally. The framework combines semantic and syntax corrections with a Mix-shot Chain of Thought prompting to guide LLMs toward correct symbolic representations and reliable code generation. Across PARARULE-Plus and CONCEPTRULES datasets, ChatLogic consistently improves reasoning accuracy and code executability, with notable gains for smaller models and high-depth tasks. This approach enhances transparency, traceability, and practicality of LLM-based reasoning in real-world multi-step tasks.

Abstract

Large language models (LLMs) such as ChatGPT and GPT-4 have demonstrated impressive capabilities in various generative tasks. However, their performance is often hampered by limitations in accessing and leveraging long-term memory, leading to specific vulnerabilities and biases, especially during long interactions. This paper introduces ChatLogic, an innovative framework specifically targeted at LLM reasoning tasks that can enhance the performance of LLMs in multi-step deductive reasoning tasks by integrating logic programming. In ChatLogic, the language model plays a central role, acting as a controller and participating in every system operation stage. We propose a novel method of converting logic problems into symbolic integration with an inference engine. This approach leverages large language models' situational understanding and imitation skills and uses symbolic memory to enhance multi-step deductive reasoning capabilities. Our results show that the ChatLogic framework significantly improves the multi-step reasoning capabilities of LLMs. The source code and data are available at \url{https://github.com/Strong-AI-Lab/ChatLogic}

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

TL;DR

Abstract

Paper Structure (18 sections, 3 equations, 3 figures, 4 tables)

This paper contains 18 sections, 3 equations, 3 figures, 4 tables.

Introduction
Related Work
LLMs Reasoning:
LLMs Code Generation:
LLMs Prompt Engineering:
Task Definition
Augmenting the Inferential Abilities of LLMs
Amplifying the Executability of Automated Code Generation Processes
ChatLogic
Framework Overview
Mix-shot CoT
Evaluation
Datasets and Metrics
LLMs Configuration
Intermediate Process
...and 3 more sections

Figures (3)

Figure 1: Demo illustrating how LLMs can effectively identify and follow correct and logical reasoning paths to solve complex multi-step reasoning problems. In this instance, our objective is to let LLMs recognize the presence of an established path, ABCEF, thereby enabling them to accurately deduce that the statement 'A infers F' is true.
Figure 2: ChatLogic containing more details uses LLMs as controllers, calls appropriate demonstration examples from Prompt Templates, guides the two modules of semantic correction(SE) and syntax correction(SYN) to output correct code, and produces execution results. This excerpts a specific question in PARARULE-Plus and the code generation process. The yellow portion represents the achievements of SE, and the cyan portion represents SYN.
Figure 3: Comparison based on the PARARULE-Plus dataset shows that while ChatGPT, even with CoT reasoning, often leads to incorrect inferences, the ChatLogic framework (also driven by ChatGPT) in most cases accurately generates pyDatalog code, highlighting its more reliable reasoning proficiency.

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

TL;DR

Abstract

ChatLogic: Integrating Logic Programming with Large Language Models for Multi-Step Reasoning

Authors

TL;DR

Abstract

Table of Contents

Figures (3)