Table of Contents
Fetching ...

Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

Minyu Chen, Guoqiang Li, Ling-I Wu, Ruibang Liu, Yuxin Su, Xi Chang, Jianxin Xue

TL;DR

This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs, and introduces a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL).

Abstract

Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators, their roles as logic code interpreters and executors have received limited attention. This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs. To further investigate this novel task, we formulate our three research questions: Can LLMs efficiently simulate the outputs of logic codes? What strength arises along with logic code simulation? And what pitfalls? To address these inquiries, we curate three novel datasets tailored for the logic code simulation task and undertake thorough experiments to establish the baseline performance of LLMs in code simulation. Subsequently, we introduce a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL). This technique advocates a dual-path thinking approach for LLMs, which has demonstrated state-of-the-art performance compared to other LLM prompt strategies, achieving a notable improvement in accuracy by 7.06% with GPT-4-Turbo.

Can Language Models Pretend Solvers? Logic Code Simulation with LLMs

TL;DR

This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs, and introduces a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL).

Abstract

Transformer-based large language models (LLMs) have demonstrated significant potential in addressing logic problems. capitalizing on the great capabilities of LLMs for code-related activities, several frameworks leveraging logical solvers for logic reasoning have been proposed recently. While existing research predominantly focuses on viewing LLMs as natural language logic solvers or translators, their roles as logic code interpreters and executors have received limited attention. This study delves into a novel aspect, namely logic code simulation, which forces LLMs to emulate logical solvers in predicting the results of logical programs. To further investigate this novel task, we formulate our three research questions: Can LLMs efficiently simulate the outputs of logic codes? What strength arises along with logic code simulation? And what pitfalls? To address these inquiries, we curate three novel datasets tailored for the logic code simulation task and undertake thorough experiments to establish the baseline performance of LLMs in code simulation. Subsequently, we introduce a pioneering LLM-based code simulation technique, Dual Chains of Logic (DCoL). This technique advocates a dual-path thinking approach for LLMs, which has demonstrated state-of-the-art performance compared to other LLM prompt strategies, achieving a notable improvement in accuracy by 7.06% with GPT-4-Turbo.
Paper Structure (15 sections, 1 equation, 8 figures, 4 tables)

This paper contains 15 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An overview of concepts in our research. The solid line illustrates current research methods, encompassing two approaches to natural language problem-solving: LLM-based logic reasoning and solver-augmented LLM reasoning. Both methods leverage logic understanding with LLMs (indicated by the black solid line) but diverge in their reliance on logic solvers. The dotted lines represent crucial issues discussed in this paper that are not mentioned in previous studies.
  • Figure 2: Samples of logical problem to be studied in our research. The light gray boxes display the ground truth of the given problems, while the results given by the SMT solver are presented in the dark grey boxes. 'UNK' denotes unknown.
  • Figure 3: Overview of the DCoL method: DCoL offers two hypotheses, SAT (satisfiable) and UNSAT (unsatisfiable), for logic code simulation. The LLMs verify these hypotheses individually before combining them to reach the final decision, while the COT method only outputs one possible reasoning path.
  • Figure 4: Prompting template of the DCoL method. Prompts can be modified slightly according to specific tasks.
  • Figure 5: Prompting template of the DCoL method. Prompts can be modified slightly according to specific tasks.
  • ...and 3 more figures