Table of Contents
Fetching ...

Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning

Yuxuan Wu, Hideki Nakayama

TL;DR

This paper tackles the challenge of learning intermediate symbolic formulas for neuro-symbolic mathematical reasoning under weak supervision. It introduces a framework that generalizes formulas to a functional DSL, uses a policy-guided, graph-based search with curriculum sampling, and applies Clean-Up and Reflection to reduce prolixity, all accelerated by asynchronous parallelization. Empirical results on the math_dm dataset show the approach discovers valid formulas across many problem categories and achieves competitive accuracy relative to end-to-end and LLM-based baselines, with ablation studies validating the contributions of the formula graph, reflection, and parallel search. The work advances practical neuro-symbolic reasoning by enabling flexible symbolic representations and scalable exploration without manual formula annotations, while noting limitations in very large search spaces and opportunities to leverage stronger encoders or prior knowledge.

Abstract

In recent years, neuro-symbolic methods have become a popular and powerful approach that augments artificial intelligence systems with the capability to perform abstract, logical, and quantitative deductions with enhanced precision and controllability. Recent studies successfully performed symbolic reasoning by leveraging various machine learning models to explicitly or implicitly predict intermediate labels that provide symbolic instructions. However, these intermediate labels are not always prepared for every task as a part of training data, and pre-trained models, represented by Large Language Models (LLMs), also do not consistently generate valid symbolic instructions with their intrinsic knowledge. On the other hand, existing work developed alternative learning techniques that allow the learning system to autonomously uncover optimal symbolic instructions. Nevertheless, their performance also exhibits limitations when faced with relatively huge search spaces or more challenging reasoning problems. In view of this, in this work, we put forward an advanced practice for neuro-symbolic reasoning systems to explore the intermediate labels with weak supervision from problem inputs and final outputs. Our experiments on the Mathematics dataset illustrated the effectiveness of our proposals from multiple aspects.

Advanced Weakly-Supervised Formula Exploration for Neuro-Symbolic Mathematical Reasoning

TL;DR

This paper tackles the challenge of learning intermediate symbolic formulas for neuro-symbolic mathematical reasoning under weak supervision. It introduces a framework that generalizes formulas to a functional DSL, uses a policy-guided, graph-based search with curriculum sampling, and applies Clean-Up and Reflection to reduce prolixity, all accelerated by asynchronous parallelization. Empirical results on the math_dm dataset show the approach discovers valid formulas across many problem categories and achieves competitive accuracy relative to end-to-end and LLM-based baselines, with ablation studies validating the contributions of the formula graph, reflection, and parallel search. The work advances practical neuro-symbolic reasoning by enabling flexible symbolic representations and scalable exploration without manual formula annotations, while noting limitations in very large search spaces and opportunities to leverage stronger encoders or prior knowledge.

Abstract

In recent years, neuro-symbolic methods have become a popular and powerful approach that augments artificial intelligence systems with the capability to perform abstract, logical, and quantitative deductions with enhanced precision and controllability. Recent studies successfully performed symbolic reasoning by leveraging various machine learning models to explicitly or implicitly predict intermediate labels that provide symbolic instructions. However, these intermediate labels are not always prepared for every task as a part of training data, and pre-trained models, represented by Large Language Models (LLMs), also do not consistently generate valid symbolic instructions with their intrinsic knowledge. On the other hand, existing work developed alternative learning techniques that allow the learning system to autonomously uncover optimal symbolic instructions. Nevertheless, their performance also exhibits limitations when faced with relatively huge search spaces or more challenging reasoning problems. In view of this, in this work, we put forward an advanced practice for neuro-symbolic reasoning systems to explore the intermediate labels with weak supervision from problem inputs and final outputs. Our experiments on the Mathematics dataset illustrated the effectiveness of our proposals from multiple aspects.

Paper Structure

This paper contains 29 sections, 4 equations, 5 figures, 10 tables, 3 algorithms.

Figures (5)

  • Figure 1: The histogram that shows the number of search iterations consumed to acquire each valid formula on three representative problem categories: mul_div_multiple, linear_1d, and conversion.
  • Figure 2: The progress of the formula search illustrated by the correlation between the total number of search iterations and the counts of solved problems on three representative problem categories. "legacy" denotes the implementation of coling.
  • Figure : General Learning Procedure
  • Figure : Formula Search
  • Figure : Asynchronous Formula Scoring