Table of Contents
Fetching ...

Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models

Ali Abbasi, Mehdi Taghipour, Rahmatollah Beheshti

TL;DR

Experiments demonstrate improved discrimination of affirmative and negated clinical statements without degrading general vision-language alignment, highlighting the value of causal interpretability for targeted model adaptation in safety-critical medical settings.

Abstract

Negation is a fundamental linguistic operation in clinical reporting, yet vision-language models (VLMs) frequently fail to distinguish affirmative from negated medical statements. To systematically characterize this limitation, we introduce a radiology-specific diagnostic benchmark that evaluates polarity sensitivity under controlled clinical conditions, revealing that common medical VLMs consistently confuse negated and non-negated findings. To enable learning beyond simple condition absence, we further construct a contextual clinical negation dataset that encodes structured claims and supports attribute-level negations involving location and severity. Building on these resources, we propose Negation-Aware Selective Training (NAST), an interpretability-guided adaptation method that uses causal tracing effects (CTEs) to modulate layer-wise gradient updates during fine-tuning. Rather than applying uniform learning rates, NAST scales each layer's update according to its causal contribution to negation processing, transforming mechanistic interpretability signals into a principled optimization rule. Experiments demonstrate improved discrimination of affirmative and negated clinical statements without degrading general vision-language alignment, highlighting the value of causal interpretability for targeted model adaptation in safety-critical medical settings. Code and resources are available at https://github.com/healthylaife/NAST.

Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models

TL;DR

Experiments demonstrate improved discrimination of affirmative and negated clinical statements without degrading general vision-language alignment, highlighting the value of causal interpretability for targeted model adaptation in safety-critical medical settings.

Abstract

Negation is a fundamental linguistic operation in clinical reporting, yet vision-language models (VLMs) frequently fail to distinguish affirmative from negated medical statements. To systematically characterize this limitation, we introduce a radiology-specific diagnostic benchmark that evaluates polarity sensitivity under controlled clinical conditions, revealing that common medical VLMs consistently confuse negated and non-negated findings. To enable learning beyond simple condition absence, we further construct a contextual clinical negation dataset that encodes structured claims and supports attribute-level negations involving location and severity. Building on these resources, we propose Negation-Aware Selective Training (NAST), an interpretability-guided adaptation method that uses causal tracing effects (CTEs) to modulate layer-wise gradient updates during fine-tuning. Rather than applying uniform learning rates, NAST scales each layer's update according to its causal contribution to negation processing, transforming mechanistic interpretability signals into a principled optimization rule. Experiments demonstrate improved discrimination of affirmative and negated clinical statements without degrading general vision-language alignment, highlighting the value of causal interpretability for targeted model adaptation in safety-critical medical settings. Code and resources are available at https://github.com/healthylaife/NAST.
Paper Structure (45 sections, 7 equations, 5 figures, 5 tables)

This paper contains 45 sections, 7 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Negation failure under polarity-controlled medical descriptions. For the same chest X-ray, two multiple-choice sets are identical except for a single semantically equivalent phrase: a negated finding ("No pneumonia is seen") versus its affirmative counterpart ("Aerated alveoli are seen"). Despite the minimal wording change and shared clinical meaning, representative medical VLMs predict correctly under the affirmative phrasing but fail under the negated phrasing, illustrating a systematic affirmative bias.
  • Figure 2: MedNega-CXR benchmark construction pipeline. Starting from MIMIC-CXR images and binary diagnostic labels, the pipeline generates (1) structured label permutations, (2) LLM-generated negation MCQs, (3) affirmative rewrites, and (4) final paired MCQs. All mappings and outputs were reviewed by two board-certified radiologists (10+ years experience).
  • Figure 3: Evaluation of medical vision--language models on the proposed diagnostic benchmark. Accuracy (%) is reported separately for affirmative-equivalent and negated multiple-choice question sets. Despite identical sentence structure and equivalent underlying clinical meaning, all models exhibit consistently lower performance on negated formulations, revealing a systematic affirmative bias in medical VLMs.
  • Figure 4: Overview of Negation-Aware Selective Training (NAST). Layer-wise causal tracing estimates each text encoder layer’s contribution to negation processing. The resulting CTE scores guide fine-tuning on the contextual negation dataset via layer-specific gradient scaling, assigning larger updates to high-CTE layers and smaller updates to low-CTE layers.
  • Figure 5: Negator-selective attention across layers and heads in the CLIP text encoder. The heatmap shows negator-selective attention per head, and the bar plot shows layer-wise averages. Negation-sensitive attention concentrates in early layers (1--4), peaking at layer 2, indicating localized rather than uniformly distributed processing.