Table of Contents
Fetching ...

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

Kohei Tsuji, Tatsuya Hiraoka, Yuchang Cheng, Eiji Aramaki, Tomoya Iwakura

TL;DR

The paper investigates how Transformer-based LLMs handle typos by identifying typo-specific neurons and attention heads that support typo-fixing through local and global contexts. It introduces a rigorous method and two data pipelines, $\Delta_n$ and $\Delta_h$, to isolate typo-related activations and attention behavior across multiple models, revealing distinct roles for early/late-layer neurons (local context) and middle-layer neurons (global context), as well as broad-context typo heads. Ablation analyses show these components contribute not only to typo correction but also to general grammatical and contextual understanding, with model-size effects shaping the reliance on typo heads. These findings offer mechanistic insights that could guide robustness improvements by reinforcing both local and global contextual processing and language-structure awareness in LLMs.

Abstract

This paper investigates how LLMs encode inputs with typos. We hypothesize that specific neurons and attention heads recognize typos and fix them internally using local and global contexts. We introduce a method to identify typo neurons and typo heads that work actively when inputs contain typos. Our experimental results suggest the following: 1) LLMs can fix typos with local contexts when the typo neurons in either the early or late layers are activated, even if those in the other are not. 2) Typo neurons in the middle layers are responsible for the core of typo-fixing with global contexts. 3) Typo heads fix typos by widely considering the context not focusing on specific tokens. 4) Typo neurons and typo heads work not only for typo-fixing but also for understanding general contexts.

Investigating Neurons and Heads in Transformer-based LLMs for Typographical Errors

TL;DR

The paper investigates how Transformer-based LLMs handle typos by identifying typo-specific neurons and attention heads that support typo-fixing through local and global contexts. It introduces a rigorous method and two data pipelines, and , to isolate typo-related activations and attention behavior across multiple models, revealing distinct roles for early/late-layer neurons (local context) and middle-layer neurons (global context), as well as broad-context typo heads. Ablation analyses show these components contribute not only to typo correction but also to general grammatical and contextual understanding, with model-size effects shaping the reliance on typo heads. These findings offer mechanistic insights that could guide robustness improvements by reinforcing both local and global contextual processing and language-structure awareness in LLMs.

Abstract

This paper investigates how LLMs encode inputs with typos. We hypothesize that specific neurons and attention heads recognize typos and fix them internally using local and global contexts. We introduce a method to identify typo neurons and typo heads that work actively when inputs contain typos. Our experimental results suggest the following: 1) LLMs can fix typos with local contexts when the typo neurons in either the early or late layers are activated, even if those in the other are not. 2) Typo neurons in the middle layers are responsible for the core of typo-fixing with global contexts. 3) Typo heads fix typos by widely considering the context not focusing on specific tokens. 4) Typo neurons and typo heads work not only for typo-fixing but also for understanding general contexts.

Paper Structure

This paper contains 32 sections, 4 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: The dataset overview (left), an overview of an input example to LLM (middle), and the visualization of $M_x$ for calculating neurons activation score $s_{n}^{x}$ (right).
  • Figure 2: Accuracy on the word identification task with different numbers of typos $t$.
  • Figure 3: Distribution of ${\Delta}_{n}$ (upper) and percentage of typo neurons per layer (lower) with $t=1$. The left figures are for Gemma 2, the center figures are for Llama 3 family and the right figures are for Qwen 2.5.
  • Figure 4: Distribution of typo neurons per layer for samples damaged or not. Values above the black line indicate many typo neurons activated when the LLMs predicted correct words.
  • Figure 5: Distribution of ${\Delta}_{h}$ for each model with $t=1$. The heat map colors are centered around 0, and the tick mark closest to 0 on the positive side of the heat bar represents the maximum ${\Delta}_{h}$. The left figures are for Gemma 2, the center figures are for Llama 3 family and the right figures are for Qwen 2.5.
  • ...and 5 more figures