LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Ebrahim Feghhi; Junlin Hu; Nima Hadidi; Jonathan C. Kao

LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Ebrahim Feghhi, Junlin Hu, Nima Hadidi, Jonathan C. Kao

Abstract

A promising pathway for restoring communication in patients with dysarthria and anarthria is speech neuroprostheses, which directly decode speech from cortical neural activity. Two benchmarks, Brain-to-Text '24 and '25, released intracranial recordings from patients with dysarthria along with a baseline algorithm trained with Connectionist Temporal Classification (CTC). Despite significant innovation on these benchmarks, all leading published prior work relies on a WFST-based CTC decoder that requires ${\sim}$320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only ${\sim}$10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.

LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Abstract

320 GB of RAM. These memory requirements limit accessibility for both patients and researchers. Here, we propose LightBeam, a non-WFST based CTC decoder that requires only

10 GB of RAM and achieves state-of-the-art performance on both benchmarks. LightBeam achieves this by integrating an LLM into the beam-search process via delayed fusion, obviating the prior need for using a large N-gram LM. LightBeam is implemented in Python and is open-source.

Paper Structure (18 sections, 2 equations, 2 figures, 3 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 2 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Methods
Dataset
Encoder models
WFST decoder
LightBeam
Shallow fusion and homophone trackng with N-gram LM
Delayed Fusion with LLM
Generative error correction
Results
Evaluation using baseline GRU encoder
Generalization to time-masked Transformer encoder
Ablations and hyperparameter sweeps
Discussion
...and 3 more sections

Figures (2)

Figure 1: LightBeam continues to match or outperform baseline WFST decoder across all benchmarks when both methods are paired with generative error correction. Results are shown with time-masked Transformer. Asterisks indicate significant improvement when performing paired t-test across $n=10$ seeds.
Figure 2: Impact of LLM rescore interval on validation WER and RTF across both datasets. An encoder frame is output every 100 ms in B2T '24, and every 80 ms in B2T '25. Shaded regions indicate SEM across $n=10$ seeds. LLM rescore interval was set to 10 and 15 throughout the study for B2T '24 and '25, respectively.

LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Abstract

LightBeam: An Accurate and Memory-Efficient CTC Decoder for Speech Neuroprostheses

Authors

Abstract

Table of Contents

Figures (2)