PromptASR for contextualized ASR with controllable style
Xiaoyu Yang, Wei Kang, Zengwei Yao, Yifan Yang, Liyong Guo, Fangjun Kuang, Long Lin, Daniel Povey
TL;DR
PromptASR addresses contextualized ASR by introducing a prompt-based conditioning mechanism that injects content and style information into a neural transducer via cross-attention with a dedicated text encoder. By using preceding utterances or biasing word lists as content prompts and optionally specifying style prompts, PromptASR achieves substantial reductions in word error rate on multi-domain data, including long-form and rare-word scenarios, while maintaining robustness when prompts are unavailable. The approach demonstrates both utterance-level context benefits and word-level biasing capabilities, with controllable transcription styles such as casing and punctuation. These results suggest practical pathways for integrating textual context and style control into end-to-end ASR, with potential extensions to more efficient prompt processing and incorporation of large language models.
Abstract
Prompts are crucial to large language models as they provide context information such as topic or logical relationships. Inspired by this, we propose PromptASR, a framework that integrates prompts in end-to-end automatic speech recognition (E2E ASR) systems to achieve contextualized ASR with controllable style of transcriptions. Specifically, a dedicated text encoder encodes the text prompts and the encodings are injected into the speech encoder by cross-attending the features from two modalities. When using the ground truth text from preceding utterances as content prompt, the proposed system achieves 21.9% and 6.8% relative word error rate reductions on a book reading dataset and an in-house dataset compared to a baseline ASR system. The system can also take word-level biasing lists as prompt to improve recognition accuracy on rare words. An additional style prompt can be given to the text encoder and guide the ASR system to output different styles of transcriptions. The code is available at icefall.
