Conflict Adaptation in Vision-Language Models
Xiaoyang Hu
TL;DR
Conflict Adaptation in Vision-Language Models investigates whether VLMs exhibit human-like conflict adaptation and seeks its representational basis. The authors combine a sequential Stroop task for 13 VLMs with SAE-based analysis that yields task-relevant supernodes, including color/text features and a conflict-modulated node. They find robust conflict adaptation in 12 of 13 models and identify interpretable color/text features along with a causal, conflict-sensitive supernode whose ablation substantially increases errors. The results suggest that VLMs exhibit emergent cognitive-control-like dynamics with both modality-specific and general conflict processing, offering insights into goal-directed processing in multimodal artificial agents.
Abstract
A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how cognitive control, a scarce resource, is recruited. Using a sequential Stroop task, we find that 12 of 13 vision-language models (VLMs) tested exhibit behavior consistent with conflict adaptation, with the lone exception likely reflecting a ceiling effect. To understand the representational basis of this behavior, we use sparse autoencoders (SAEs) to identify task-relevant supernodes in InternVL 3.5 4B. Partially overlapping supernodes emerge for text and color in both early and late layers, and their relative sizes mirror the automaticity asymmetry between reading and color naming in humans. We further isolate a conflict-modulated supernode in layers 24-25 whose ablation significantly increases Stroop errors while minimally affecting congruent trials.
