Circuit Tracing in Vision-Language Models: Understanding the Internal Mechanisms of Multimodal Thinking
Jingcheng Yang, Tianhu Xiong, Shengyi Qian, Klara Nahrstedt, Mingyuan Wu
TL;DR
This work introduces the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning and reveals that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations.
Abstract
Vision-language models (VLMs) are powerful but remain opaque black boxes. We introduce the first framework for transparent circuit tracing in VLMs to systematically analyze multimodal reasoning. By utilizing transcoders, attribution graphs, and attention-based methods, we uncover how VLMs hierarchically integrate visual and semantic concepts. We reveal that distinct visual feature circuits can handle mathematical reasoning and support cross-modal associations. Validated through feature steering and circuit patching, our framework proves these circuits are causal and controllable, laying the groundwork for more explainable and reliable VLMs.
