Thunder-NUBench: A Benchmark for LLMs' Sentence-Level Negation Understanding
Yeonkyoung So, Gyuseong Lee, Sungmok Jung, Joonhak Lee, JiA Kang, Sangho Kim, Jaejin Lee
TL;DR
Thunder-NUBench introduces a sentence-level negation benchmark that formalizes a truth-functional standard negation operator $Neg(\cdot)$ and contrasts it with local negation, contradiction, and paraphrase to probe semantic reasoning in LLMs. The dataset comprises manually curated standard negations and a four-option MCQ evaluation built from English sources (Hover and Wikipedia), with rigorous multi-stage review and careful data-generation guidelines. Empirical results across model families (2–3B, 7–8B, and API models) show that few-shot prompting and supervised fine-tuning with LoRA improve performance, yet models frequently confuse local negation with standard negation, especially under complex sentence structures. Thunder-NUBench thus provides a robust diagnostic tool for semantic negation understanding, enabling targeted improvements in reasoning capabilities across diverse model types, while acknowledging language- and domain-specific limitations and the need for multilingual extension.
Abstract
Negation is a fundamental linguistic phenomenon that poses ongoing challenges for Large Language Models (LLMs), particularly in tasks requiring deep semantic understanding. Current benchmarks often treat negation as a minor detail within broader tasks, such as natural language inference. Consequently, there is a lack of benchmarks specifically designed to evaluate comprehension of negation. In this work, we introduce Thunder-NUBench, a novel benchmark explicitly created to assess sentence-level understanding of negation in LLMs. Thunder-NUBench goes beyond merely identifying surface-level cues by contrasting standard negation with structurally diverse alternatives, such as local negation, contradiction, and paraphrase. This benchmark includes manually curated sentence-negation pairs and a multiple-choice dataset, allowing for a comprehensive evaluation of models' understanding of negation.
