Table of Contents
Fetching ...

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

TL;DR

This work tackles punctuation restoration for ASR outputs by leveraging LLaMA2 with LoRA fine-tuning to improve accuracy and scalability across domains. It introduces Forward Pass Only Decoding (FPOD), a non-autoregressive decoding method that converts generation into a verification task with a single forward pass, supplemented by sliding-window and recursive variants to handle long contexts. The paper shows that FPOD delivers substantial speedups (up to ~19.8x) with minimal loss in punctuation F1 scores on Librispeech-PC, and that recursive FPOD achieves excellent performance on long utterances, outperforming RNNT and Whisper baselines. Overall, FPOD addresses speed bottlenecks and hallucination concerns, making LLaMA-based punctuation restoration viable for large-scale data annotation and cross-language deployment.

Abstract

This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To address this, our second contribution presents Forward Pass Only Decoding (FPOD), a novel decoding approach for annotation tasks. This innovative method results in a substantial 19.8x improvement in inference speed, effectively addressing a critical bottleneck and enhancing the practical utility of LLaMA for large-scale data annotation tasks without hallucinations. The combination of these contributions not only solidifies LLaMA as a powerful tool for punctuation restoration but also highlights FPOD as a crucial strategy for overcoming speed constraints.

LLaMA based Punctuation Restoration With Forward Pass Only Decoding

TL;DR

This work tackles punctuation restoration for ASR outputs by leveraging LLaMA2 with LoRA fine-tuning to improve accuracy and scalability across domains. It introduces Forward Pass Only Decoding (FPOD), a non-autoregressive decoding method that converts generation into a verification task with a single forward pass, supplemented by sliding-window and recursive variants to handle long contexts. The paper shows that FPOD delivers substantial speedups (up to ~19.8x) with minimal loss in punctuation F1 scores on Librispeech-PC, and that recursive FPOD achieves excellent performance on long utterances, outperforming RNNT and Whisper baselines. Overall, FPOD addresses speed bottlenecks and hallucination concerns, making LLaMA-based punctuation restoration viable for large-scale data annotation and cross-language deployment.

Abstract

This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To address this, our second contribution presents Forward Pass Only Decoding (FPOD), a novel decoding approach for annotation tasks. This innovative method results in a substantial 19.8x improvement in inference speed, effectively addressing a critical bottleneck and enhancing the practical utility of LLaMA for large-scale data annotation tasks without hallucinations. The combination of these contributions not only solidifies LLaMA as a powerful tool for punctuation restoration but also highlights FPOD as a crucial strategy for overcoming speed constraints.
Paper Structure (10 sections, 4 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 4 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: LoRA based Llama2 finetuning prompt template with example instruction, input and response.
  • Figure 2: Directly feeding input as response in prompt for forward pass only decoding (FPOD) scheme.
  • Figure 3: FPOD for punctuation restoration
  • Figure 4: Sliding window with padding approach for long input text.