Table of Contents
Fetching ...

LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning

Navapat Nananukul, Yue Zhang, Ryan Lee, Eric Boxer, Jonathan May, Vibhav Giridhar Gogate, Jay Pujara, Mayank Kejriwal

TL;DR

This work tackles the challenge of high-assurance reasoning in critical domains by introducing LOGicalThought (LogT), a neurosymbolic framework that grounds LLMs with dual contexts: a Symbolic Graph Context for structured domain knowledge and a Logic-based Context realized through ErgoAI for defeasible, non-monotonic rules. The approach converts long-form guidelines into a compact, machine-readable knowledge base and executable logic, enabling verifiable inferences over negation, implication, and defeasible reasoning. Across four multi-domain benchmarks, LogT achieves an average improvement of $11.84\%$ over strong baselines and shows particularly large gains for smaller models, with notable boosts in implication reasoning up to $13.2\%$. The results also demonstrate that dual-context grounding yields longer, more rule-based reasoning traces, improved robustness, and overall higher alignment between reasoning and final predictions, underscoring the practical potential of neurosymbolic grounding for high-assurance AI systems.

Abstract

High-assurance reasoning, particularly in critical domains such as law and medicine, requires conclusions that are accurate, verifiable, and explicitly grounded in evidence. This reasoning relies on premises codified from rules, statutes, and contracts, inherently involving defeasible or non-monotonic logic due to numerous exceptions, where the introduction of a single fact can invalidate general rules, posing significant challenges. While large language models (LLMs) excel at processing natural language, their capabilities in standard inference tasks do not translate to the rigorous reasoning required over high-assurance text guidelines. Core reasoning challenges within such texts often manifest specific logical structures involving negation, implication, and, most critically, defeasible rules and exceptions. In this paper, we propose a novel neurosymbolically-grounded architecture called LOGicalThought (LogT) that uses an advanced logical language and reasoner in conjunction with an LLM to construct a dual symbolic graph context and logic-based context. These two context representations transform the problem from inference over long-form guidelines into a compact grounded evaluation. Evaluated on four multi-domain benchmarks against four baselines, LogT improves overall performance by 11.84% across all LLMs. Performance improves significantly across all three modes of reasoning: by up to +10.2% on negation, +13.2% on implication, and +5.5% on defeasible reasoning compared to the strongest baseline.

LOGicalThought: Logic-Based Ontological Grounding of LLMs for High-Assurance Reasoning

TL;DR

This work tackles the challenge of high-assurance reasoning in critical domains by introducing LOGicalThought (LogT), a neurosymbolic framework that grounds LLMs with dual contexts: a Symbolic Graph Context for structured domain knowledge and a Logic-based Context realized through ErgoAI for defeasible, non-monotonic rules. The approach converts long-form guidelines into a compact, machine-readable knowledge base and executable logic, enabling verifiable inferences over negation, implication, and defeasible reasoning. Across four multi-domain benchmarks, LogT achieves an average improvement of over strong baselines and shows particularly large gains for smaller models, with notable boosts in implication reasoning up to . The results also demonstrate that dual-context grounding yields longer, more rule-based reasoning traces, improved robustness, and overall higher alignment between reasoning and final predictions, underscoring the practical potential of neurosymbolic grounding for high-assurance AI systems.

Abstract

High-assurance reasoning, particularly in critical domains such as law and medicine, requires conclusions that are accurate, verifiable, and explicitly grounded in evidence. This reasoning relies on premises codified from rules, statutes, and contracts, inherently involving defeasible or non-monotonic logic due to numerous exceptions, where the introduction of a single fact can invalidate general rules, posing significant challenges. While large language models (LLMs) excel at processing natural language, their capabilities in standard inference tasks do not translate to the rigorous reasoning required over high-assurance text guidelines. Core reasoning challenges within such texts often manifest specific logical structures involving negation, implication, and, most critically, defeasible rules and exceptions. In this paper, we propose a novel neurosymbolically-grounded architecture called LOGicalThought (LogT) that uses an advanced logical language and reasoner in conjunction with an LLM to construct a dual symbolic graph context and logic-based context. These two context representations transform the problem from inference over long-form guidelines into a compact grounded evaluation. Evaluated on four multi-domain benchmarks against four baselines, LogT improves overall performance by 11.84% across all LLMs. Performance improves significantly across all three modes of reasoning: by up to +10.2% on negation, +13.2% on implication, and +5.5% on defeasible reasoning compared to the strongest baseline.

Paper Structure

This paper contains 38 sections, 4 equations, 33 figures, 9 tables, 1 algorithm.

Figures (33)

  • Figure 1: A schematized illustration of LogT. An LLM initially processes raw guidelines, a scenario, and a hypothesis to create a symbolic graph context. This unified structure contains a rule ontology, factual knowledge triples, and a natural language query. This context acts as an input for two distinct reasoning approaches: (1) it provides rich context for guiding LLM reasoning, and (2) it serves as a blueprint for synthesizing a logic program. Compilable logic programs and queries are formalized into logic-based context, completing the dual neurosymbolic context underlying the approach.
  • Figure 2: The proposed benchmark enhancement workflow for evaluating three modes of reasoning (negation, implication, and defeasibility).
  • Figure 3: Performance evaluation of LogT: (a) shows an accuracy comparison between LogT (indicated by markers) and the strongest baselines (bars) across four benchmarks; (b) details the average number of reasoning steps per trace for both LogT and the CoT baseline; (c) displays the four outcome distributions for LogT and CoT, classified by whether the reasoning and final prediction were correct or incorrect.
  • Figure 4: Hypothesis examples of all benchmarks and reasoning modes
  • Figure 5: ErgoAI syntax and instruction that goes the ErgoAI program generation prompt.
  • ...and 28 more figures