Filling the Gap: Is Commonsense Knowledge Generation useful for Natural Language Inference?
Chathuri Jayaweera, Brianna Yanqui, Bonnie Dorr
TL;DR
This work investigates whether Large Language Models can generate commonsense axioms to aid Natural Language Inference (NLI). It introduces a pipeline that generates axioms (P1), injects them before inference (P2), and compares against a direct inference baseline (P3), plus a hybrid selective-access strategy guided by a helpfulness rating. Through experiments on SNLI and ANLI using Llama-3.1-70B-Instruct and gpt-oss-120b, the hybrid approach achieves consistent improvements by effectively combining pre-prediction axiom injection with post-prediction reasoning. The findings highlight the value of targeted external commonsense knowledge for NLI and point to future work needed to reliably identify cases that benefit from such knowledge and to broaden evaluation across more models and languages.
Abstract
Natural Language Inference (NLI) is the task of determining whether a premise entails, contradicts, or is neutral with respect to a given hypothesis. The task is often framed as emulating human inferential processes, in which commonsense knowledge plays a major role. This study examines whether Large Language Models (LLMs) can generate useful commonsense axioms for Natural Language Inference, and evaluates their impact on performance using the SNLI and ANLI benchmarks with the Llama-3.1-70B and gpt-oss-120b models. We show that a hybrid approach, which selectively provides highly factual axioms based on judged helpfulness, yields consistent accuracy improvements of 1.99% to 6.88% across tested configurations, demonstrating the effectiveness of selective knowledge access for NLI. We also find that this targeted use of commonsense knowledge helps models overcome a bias toward the Neutral class by providing essential real-world context.
