Table of Contents
Fetching ...

Lyria: A Genetic Algorithm-Driven Neuro-Symbolic Reasoning Framework for LLMs

Weizhi Tang, Kwabena Nuamah, Vaishak Belle

TL;DR

Lyria presents a neuro-symbolic reasoning framework that couples LLMs with genetic algorithms and symbolic systems to address two persistent challenges in LLM reasoning: getting trapped in local optima and incomplete exploration of the solution space. The framework comprises seven components (Error Detector, Deduplicator, Experience Pool, Fitness Evaluator, Selector, Crossover Operator, and Mutation Operator) and supports both Oracle-based and LLM-based evaluators, as well as external and LLM-guided crossover/mutation strategies. Extensive experiments on Sudoku, Graph Coloring, and Traveling Salesman Problem across four LLMs show consistent performance gains over direct prompting baselines, with ablations highlighting the importance of evaluator reliability and operator design. Building on Lyria, LAFT enables a weaker model to imitate the reasoning of a stronger model within the Lyria framework, yielding substantial improvements and sometimes surpassing stronger baselines. The work also discusses limitations, notably the reliance on oracle-based verification and the need to automate operator design, and points to future directions for broader applicability and efficiency.

Abstract

While LLMs have demonstrated impressive abilities across various domains, they struggle with two major issues. The first is that LLMs trap themselves into local optima and the second is that they lack exhaustive coverage of the solution space. To investigate and improve these two issues, we propose Lyria, a neuro-symbolic reasoning framework building on the integration of LLMs, genetic algorithms, and symbolic systems, comprising 7 essential components. Through conducting extensive experiments with 4 LLMs across 3 types of problems, we demonstrated the efficacy of Lyria. Furthermore, with 7 additional ablation experiments, we further systematically analyzed and elucidated the factors that affect its performance. In addition, based on Lyria, we extend the ideas to the fine-tuning process of LLMs and introduce LAFT which enables a weaker model to imitate the reasoning process of a stronger model that reason under the Lyria reasoning framework. We demonstrate that the significant effectiveness of LAFT by conducting extensive experiments against 9 constructed baselines. We finally reveal the limitations and provide insights into future directions.

Lyria: A Genetic Algorithm-Driven Neuro-Symbolic Reasoning Framework for LLMs

TL;DR

Lyria presents a neuro-symbolic reasoning framework that couples LLMs with genetic algorithms and symbolic systems to address two persistent challenges in LLM reasoning: getting trapped in local optima and incomplete exploration of the solution space. The framework comprises seven components (Error Detector, Deduplicator, Experience Pool, Fitness Evaluator, Selector, Crossover Operator, and Mutation Operator) and supports both Oracle-based and LLM-based evaluators, as well as external and LLM-guided crossover/mutation strategies. Extensive experiments on Sudoku, Graph Coloring, and Traveling Salesman Problem across four LLMs show consistent performance gains over direct prompting baselines, with ablations highlighting the importance of evaluator reliability and operator design. Building on Lyria, LAFT enables a weaker model to imitate the reasoning of a stronger model within the Lyria framework, yielding substantial improvements and sometimes surpassing stronger baselines. The work also discusses limitations, notably the reliance on oracle-based verification and the need to automate operator design, and points to future directions for broader applicability and efficiency.

Abstract

While LLMs have demonstrated impressive abilities across various domains, they struggle with two major issues. The first is that LLMs trap themselves into local optima and the second is that they lack exhaustive coverage of the solution space. To investigate and improve these two issues, we propose Lyria, a neuro-symbolic reasoning framework building on the integration of LLMs, genetic algorithms, and symbolic systems, comprising 7 essential components. Through conducting extensive experiments with 4 LLMs across 3 types of problems, we demonstrated the efficacy of Lyria. Furthermore, with 7 additional ablation experiments, we further systematically analyzed and elucidated the factors that affect its performance. In addition, based on Lyria, we extend the ideas to the fine-tuning process of LLMs and introduce LAFT which enables a weaker model to imitate the reasoning process of a stronger model that reason under the Lyria reasoning framework. We demonstrate that the significant effectiveness of LAFT by conducting extensive experiments against 9 constructed baselines. We finally reveal the limitations and provide insights into future directions.

Paper Structure

This paper contains 42 sections, 2 equations, 3 figures, 7 tables, 1 algorithm.

Figures (3)

  • Figure 1: The Lyria reasoning framework, consisting of 7 essential components, i.e., Error Detector, Deduplicator, Experience Pool, Fitness Evaluator, Selector, Crossover Operator, and Mutation Operator, enables evolving candidate solutions through generations to obtain superior solution.
  • Figure 2: The figure shows the performance comparison between Lyria and BoN, in which the x-axis indexes each parameter set, e.g., index $0$ means the pair of ($n_p=5$, $n_g=5$) for Lyria and $N = 23$ for BoN, and the y-axis shows the corresponding score averaging across $\mathrm{SK}_{PS}$, $\mathrm{GC}_{PS}$, and $\mathrm{TSP}_{PS}$.
  • Figure 3: The overall process of LAFT in which the orange dashed-border block depicts the FT process and the blue dashed-border block illustrates the inference process. The FT process begins with a set of questions generated for FT data construction, which are answered by a stronger LLM equipped with the Lyria reasoning framework. During reasoning, detailed reasoning traces are collected, including the initial populations, crossover fragments, and mutation fragments. These fragments are subsequently filtered to remove suboptimal ones and retain beneficial ones, forming a FT dataset. The dataset is then used to fine-tune a weaker LLM. During inference, the fine-tuned weaker LLM employs Lyria to reason over new questions and generate final answers.