Table of Contents
Fetching ...

LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Ebrahim Feghhi, Junlin Hu, Nima Hadidi, Jonathan C. Kao

Abstract

A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.

LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Abstract

A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires 320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only 10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.
Paper Structure (18 sections, 2 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 2 figures, 3 tables, 1 algorithm.

Figures (2)

  • Figure 1: LightBeam continues to match or outperform baseline WFST decoder across all benchmarks when both methods are paired with generative error correction. Results are shown with time-masked Transformer. Asterisks indicate significant improvement when performing paired t-test across $n=10$ seeds.
  • Figure 2: Impact of LLM rescore interval on validation WER and RTF across both datasets. An encoder frame is output every 100 ms in B2T '24, and every 80 ms in B2T '25. Shaded regions indicate SEM across $n=10$ seeds. LLM rescore interval was set to 10 and 15 throughout the study for B2T '24 and '25, respectively.