Table of Contents
Fetching ...

Inducing Dyslexia in Vision Language Models

Melika Honarmand, Ayati Sharma, Badr AlKhamissi, Johannes Mehrer, Martin Schrimpf

TL;DR

This work models dyslexia by functionally identifying and ablating visual-word-form-selective units in a vision-language model, producing reading-specific impairments while leaving general reasoning intact. Using benchmarks designed for humans (ROAR, Raven, Kempler, and orthography/phonology lexical tasks), the authors show that targeted VWFA-like perturbations yield phonological deficits without orthographic disruptions, aligning with human dyslexia patterns. The study introduces a minimal subnetwork (~$6.89$ percent of units) whose ablation reproduces key dyslexic features, and interprets the results through the lens of VWFA functionality, providing a causal, mechanistic framework for studying reading disorders. Beyond dyslexia, the paper presents a generalizable methodology for using functional localization and targeted perturbations in VLMs as digital twins to explore brain disorders and develop hypothesis-driven interventions.

Abstract

Dyslexia, a neurodevelopmental disorder characterized by persistent reading difficulties, is often linked to reduced activity of the visual word form area in the ventral occipito-temporal cortex. Traditional approaches to studying dyslexia, such as behavioral and neuroimaging methods, have provided valuable insights but remain limited in their ability to test causal hypotheses about the underlying mechanisms of reading impairments. In this study, we use large-scale vision-language models (VLMs) to simulate dyslexia by functionally identifying and perturbing artificial analogues of word processing. Using stimuli from cognitive neuroscience, we identify visual-word-form-selective units within VLMs and demonstrate that targeted ablation of these units, unlike ablation of random units, leads to selective impairments in reading tasks while general visual and language comprehension abilities remain intact. In particular, the resulting model matches dyslexic humans' phonological deficits without a significant change in orthographic processing. Taken together, our modeling results replicate key characteristics of dyslexia and establish a computational framework for investigating reading disorders.

Inducing Dyslexia in Vision Language Models

TL;DR

This work models dyslexia by functionally identifying and ablating visual-word-form-selective units in a vision-language model, producing reading-specific impairments while leaving general reasoning intact. Using benchmarks designed for humans (ROAR, Raven, Kempler, and orthography/phonology lexical tasks), the authors show that targeted VWFA-like perturbations yield phonological deficits without orthographic disruptions, aligning with human dyslexia patterns. The study introduces a minimal subnetwork (~ percent of units) whose ablation reproduces key dyslexic features, and interprets the results through the lens of VWFA functionality, providing a causal, mechanistic framework for studying reading disorders. Beyond dyslexia, the paper presents a generalizable methodology for using functional localization and targeted perturbations in VLMs as digital twins to explore brain disorders and develop hypothesis-driven interventions.

Abstract

Dyslexia, a neurodevelopmental disorder characterized by persistent reading difficulties, is often linked to reduced activity of the visual word form area in the ventral occipito-temporal cortex. Traditional approaches to studying dyslexia, such as behavioral and neuroimaging methods, have provided valuable insights but remain limited in their ability to test causal hypotheses about the underlying mechanisms of reading impairments. In this study, we use large-scale vision-language models (VLMs) to simulate dyslexia by functionally identifying and perturbing artificial analogues of word processing. Using stimuli from cognitive neuroscience, we identify visual-word-form-selective units within VLMs and demonstrate that targeted ablation of these units, unlike ablation of random units, leads to selective impairments in reading tasks while general visual and language comprehension abilities remain intact. In particular, the resulting model matches dyslexic humans' phonological deficits without a significant change in orthographic processing. Taken together, our modeling results replicate key characteristics of dyslexia and establish a computational framework for investigating reading disorders.

Paper Structure

This paper contains 30 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Modeling dyslexia via visual-word-form hypoactivation.(a) In humans, reduced activity in the visual word form area is thought to result in diminished performance on reading-related measures while sparing general visual intelligence. (b) Testing this hypothesis in vision-language models, we find that ablating visual-word-form-selective units produces the same dissociation.
  • Figure 2: Identifying visual-word-form-selective units in VLMs.(1) To identify VWF-selective units, we compare unit activations in response to images of words versus images of non-words, and identify the units that exhibit the strongest word selectivity. (2) To model the reduced VWFA activity observed in dyslexic individuals, we ablate the localized units. As a control, we ablate an equal number of randomly selected units. (3) To assess the impact of ablations, we evaluate model performance on dyslexia screening tasks (ROAR yeatman2021rapid and the Lexical Decision benchmark luke2023dyslexics) as well as on visual IQ and reasoning tasks (RAVEN burke1958raven and kempler1998sentence sentence comprehension tasks).
  • Figure 3: Ablating visual-word-form-selective units in the model.(a) Increasing the number of VWF ablated units translates into a severe monotonic performance decline in ROAR (blue), while RAVEN (gray) is only affected at larger mask sizes. We chose the first mask size (bold) where ROAR performance falls below the dyslexia threshold (blue dashed line). Shaded regions represent 95% confidence intervals. (b) Beyond full ablation (bold), scaling unit activity (with a fixed mask size of 6.89%) has little effect for positive scaling, while negative scaling severely degrades outputs non-selectively. (c) While ablations in all layer types substantially affect performance, only the MLP components of the language decoder showed selective effects, indicating its core involvement in reading (full trends in Fig. \ref{['fig:mask_size_v0']}). (d) Distribution of VWF-selective units across the 80 transformer blocks of the language decoder. Ratios are mean across 20 random seeds and resampling of the localizer stimuli; standard deviations never exceed 0.03%.
  • Figure 4: Reading-selective deficits from ablating VWF-selective units.(a) Ablating VWF-selective units led to a selective reading deficit below the dyslexia threshold (ROAR, $p<0.012$), while performance on visual IQ and reasoning benchmarks (RAVEN, Kempler) remained intact or was slightly enhanced. (b) Ablating an equal number of randomly selected units from the same layers affected performance throughout, with ROAR remaining above the dyslexia threshold and significant impairments to visual reasoning. Dark blue bars indicate ablated model accuracy, relative to the intact model (light blue bars). Error bars denote 95% confidence intervals, and significance was assessed with one-sample, one-tailed Student’s t-test.
  • Figure 5: Model mirrors human phonological reading deficits.(a) Lexical decision accuracy on the ROAR reading test before (light blue) and after (dark blue) ablation of VWF-selective units. Following ablation, model lexical decision accuracy at real/pseudo word classification drops below the dyslexia threshold, paralleling dyslexic participants (dark green) relative to control subjects (light green). (b) Ablated model performance decreases for phonologically confusable stimuli (e.g. "beaf" which sounds the same as "beef" but is a pseudo word), indicating a phonological deficit and mirroring observations in humans. (c) Ablated model performance is not significantly affected on orthographically confusable stimuli (e.g. "golve" which looks similar to "glove" but is a pseudo word), whereas dyslexic humans tend to be affected. Error bars denote 95% confidence intervals; model results are averaged over 20 random seeds, each corresponding to a different sample of the localizer; significance assessed via one-tailed Student’s t-test (Appendix \ref{['sec:Statistics']}).
  • ...and 4 more figures