LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Yutong Pang; Debjyoti Paul; Kevin Jiang; Xuedong Zhang; Xin Lei

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

TL;DR

This work tackles punctuation restoration for ASR outputs by leveraging LLaMA2 with LoRA fine-tuning to improve accuracy and scalability across domains. It introduces Forward Pass Only Decoding (FPOD), a non-autoregressive decoding method that converts generation into a verification task with a single forward pass, supplemented by sliding-window and recursive variants to handle long contexts. The paper shows that FPOD delivers substantial speedups (up to ~19.8x) with minimal loss in punctuation F1 scores on Librispeech-PC, and that recursive FPOD achieves excellent performance on long utterances, outperforming RNNT and Whisper baselines. Overall, FPOD addresses speed bottlenecks and hallucination concerns, making LLaMA-based punctuation restoration viable for large-scale data annotation and cross-language deployment.

Abstract

This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To address this, our second contribution presents Forward Pass Only Decoding (FPOD), a novel decoding approach for annotation tasks. This innovative method results in a substantial 19.8x improvement in inference speed, effectively addressing a critical bottleneck and enhancing the practical utility of LLaMA for large-scale data annotation tasks without hallucinations. The combination of these contributions not only solidifies LLaMA as a powerful tool for punctuation restoration but also highlights FPOD as a crucial strategy for overcoming speed constraints.

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

TL;DR

Abstract

Paper Structure (10 sections, 4 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 4 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Proposed Method
Auto Regressive Generation
Speculative Decoding
Forward Pass Only Decoding
Experiments
LoRA Finetuned Model for Punctuation Restoration
Punctuation Benchmark with Different Decoding Methods
LLaMA based model vs. RNNT model and Whisper for long input utterance
Conclusion

Figures (4)

Figure 1: LoRA based Llama2 finetuning prompt template with example instruction, input and response.
Figure 2: Directly feeding input as response in prompt for forward pass only decoding (FPOD) scheme.
Figure 3: FPOD for punctuation restoration
Figure 4: Sliding window with padding approach for long input text.

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

TL;DR

Abstract

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Authors

TL;DR

Abstract

Table of Contents

Figures (4)