Table of Contents
Fetching ...

Conflict Adaptation in Vision-Language Models

Xiaoyang Hu

TL;DR

Conflict Adaptation in Vision-Language Models investigates whether VLMs exhibit human-like conflict adaptation and seeks its representational basis. The authors combine a sequential Stroop task for 13 VLMs with SAE-based analysis that yields task-relevant supernodes, including color/text features and a conflict-modulated node. They find robust conflict adaptation in 12 of 13 models and identify interpretable color/text features along with a causal, conflict-sensitive supernode whose ablation substantially increases errors. The results suggest that VLMs exhibit emergent cognitive-control-like dynamics with both modality-specific and general conflict processing, offering insights into goal-directed processing in multimodal artificial agents.

Abstract

A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases Stroop errors while minimally affecting congruent trials.

Conflict Adaptation in Vision-Language Models

TL;DR

Conflict Adaptation in Vision-Language Models investigates whether VLMs exhibit human-like conflict adaptation and seeks its representational basis. The authors combine a sequential Stroop task for 13 VLMs with SAE-based analysis that yields task-relevant supernodes, including color/text features and a conflict-modulated node. They find robust conflict adaptation in 12 of 13 models and identify interpretable color/text features along with a causal, conflict-sensitive supernode whose ablation substantially increases errors. The results suggest that VLMs exhibit emergent cognitive-control-like dynamics with both modality-specific and general conflict processing, offering insights into goal-directed processing in multimodal artificial agents.

Abstract

A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases Stroop errors while minimally affecting congruent trials.

Paper Structure

This paper contains 8 sections, 7 figures.

Figures (7)

  • Figure 1: Sequential Stroop task design. Left: a congruent trial followed by an incongruent trial. Right: an incongruent trial followed by another incongruent trial.
  • Figure 2: Average log probabilities assigned to correct second color tokens across conditions under left-right word arrangement. 12 of 13 models tested show higher values for II (incongruent following incongruent) than CI (incongruent following congruent). Condition accuracy shown below each bar.
  • Figure 3: Average log probabilities assigned to correct second color tokens across conditions under top-down word arrangement. 12 of 13 models tested show higher values for II (incongruent following incongruent) than CI (incongruent following congruent). Condition accuracy shown below each bar.
  • Figure 4: Qwen2.5 VL 72B Instruct shows reverse conflict adaptation.
  • Figure 5: Red (color) supernodes across layers 8-11 and 19-33 and RED (text) supernodes across layers 3-15 and 19-33. Shared features are in white.
  • ...and 2 more figures