NegVQA: Can Vision Language Models Understand Negation?

Yuhui Zhang; Yuchang Su; Yiming Liu; Serena Yeung-Levy

NegVQA: Can Vision Language Models Understand Negation?

Yuhui Zhang, Yuchang Su, Yiming Liu, Serena Yeung-Levy

TL;DR

NegVQA addresses the challenge of negation understanding in vision-language models by introducing a large, curated VQA benchmark of 7,379 negated two-choice questions drawn from diverse domains. The dataset is generated via GPT-4o to create fluent negations of existing questions, with answer choices inverted to test true negation comprehension under zero-shot evaluation of 20 VLMs across seven families. The results reveal a pervasive struggle with negation, including a notable U-shaped scaling trend where model performance first declines with increasing size before improving, and a substantial gap relative to human performance. The work provides a critical diagnostic resource and highlights concrete directions for improving negation handling in VLMs, contributing to safer and more reliable multimodal AI systems.

Abstract

Negation is a fundamental linguistic phenomenon that can entirely reverse the meaning of a sentence. As vision language models (VLMs) continue to advance and are deployed in high-stakes applications, assessing their ability to comprehend negation becomes essential. To address this, we introduce NegVQA, a visual question answering (VQA) benchmark consisting of 7,379 two-choice questions covering diverse negation scenarios and image-question distributions. We construct NegVQA by leveraging large language models to generate negated versions of questions from existing VQA datasets. Evaluating 20 state-of-the-art VLMs across seven model families, we find that these models struggle significantly with negation, exhibiting a substantial performance drop compared to their responses to the original questions. Furthermore, we uncover a U-shaped scaling trend, where increasing model size initially degrades performance on NegVQA before leading to improvements. Our benchmark reveals critical gaps in VLMs' negation understanding and offers insights into future VLM development. Project page available at https://yuhui-zh15.github.io/NegVQA/.

NegVQA: Can Vision Language Models Understand Negation?

TL;DR

Abstract

NegVQA: Can Vision Language Models Understand Negation?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)