Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge

Marcelo Labre

Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge

Marcelo Labre

TL;DR

This work investigates grounding language-model reasoning in formal domain knowledge by integrating the OpenMath ontology with a neuro-symbolic inference pipeline. Using the MATH 500 benchmark, it evaluates small to mid-sized models across a five-phase workflow: knowledge base construction, concept extraction, hybrid retrieval, cross-encoder reranking, and augmented inference. Results show that ontology-guided context can improve accuracy when the retrieved OpenMath definitions are highly relevant, but noise from irrelevant symbols degrades performance; best-of-N sampling generally yields more robust gains across models, though benefits vary with problem type and difficulty due to OpenMath coverage gaps. The study highlights both the promise and challenges of grounding neural reasoning in formal ontologies for high-stakes domains, and it points to concrete avenues for improving retrieval, injection strategies, and domain coverage to enable practical, trustworthy AI in specialized fields.

Abstract

Language models exhibit fundamental limitations -- hallucination, brittleness, and lack of formal grounding -- that are particularly problematic in high-stakes specialist fields requiring verifiable reasoning. I investigate whether formal domain ontologies can enhance language model reliability through retrieval-augmented generation. Using mathematics as proof of concept, I implement a neuro-symbolic pipeline leveraging the OpenMath ontology with hybrid retrieval and cross-encoder reranking to inject relevant definitions into model prompts. Evaluation on the MATH benchmark with three open-source models reveals that ontology-guided context improves performance when retrieval quality is high, but irrelevant context actively degrades it -- highlighting both the promise and challenges of neuro-symbolic approaches.

Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge

TL;DR

Abstract

Paper Structure (35 sections, 4 equations, 5 figures, 10 tables)

This paper contains 35 sections, 4 equations, 5 figures, 10 tables.

Introduction
Background
Literature Review
The OpenMath Ontology
Hypothesis on Neuro-Symbolic Ontology-Guided Model Inference
Experiments
Project Implementation
Model Choice
Experimental Configurations
Evaluation Metrics
Results
Overall Accuracy Impact
Accuracy by Difficulty Level
Accuracy by Problem Type
Efficiency Analysis: Average Attempts in Best-of-N Mode
...and 20 more sections

Figures (5)

Figure 4.1: $\Delta\text{Accuracy}$ (lines) and $\text{Attempts Ratio}$ (bubbles) by reranker threshold for all MATH 500 problem levels and types.
Figure 4.2: $\Delta\text{Accuracy}$ by MATH 500 problem level and reranker threshold. Blue indicates improvement while red indicates degradation.
Figure 4.3: $\Delta\text{Accuracy}$ by MATH 500 problem type and reranker threshold. Blue indicates improvement while red indicates degradation.
Figure 4.4: $\Delta\text{Attempts}$ by reranker score threshold in best-of-n mode. Negative values (blue) indicate OpenMath requires fewer attempts while positive values (red) indicate more attempts required.
Figure B.1: System architecture for ontology-guided mathematical inference.

Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge

TL;DR

Abstract

Ontology-Guided Neuro-Symbolic Inference: Grounding Language Models with Mathematical Domain Knowledge

Authors

TL;DR

Abstract

Table of Contents

Figures (5)