Table of Contents
Fetching ...

Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions

Weiwei Wang, Weijie Zou, Jiyong Min

TL;DR

The paper investigates the repetition problem in production LLM deployments, focusing on batch code interpretation where greedy decoding induces looping and dramatic processing delays. It combines theoretical Markov-model analysis with empirical evaluation of three remedies: Beam Search with early_stopping as a universal inference-time solution, a task-specific presence_penalty, and a universal model-level Direct Preference Optimization (DPO) fine-tuning. The study demonstrates that early_stopping is the decisive parameter for Beam Search effectiveness, and that DPO can universally mitigate repetition across BadCase types, while presence_penalty effectively handles BadCase 1. The results translate into practical production guidance, including parameter configurations, framework integration notes, and recommended deployment practices to restore normal performance in production systems.

Abstract

The repetition problem, where Large Language Models (LLMs) continuously generate repetitive content without proper termination, poses a critical challenge in production deployments, causing severe performance degradation and system stalling. This paper presents a comprehensive investigation and multiple practical solutions for the repetition problem encountered in real-world batch code interpretation tasks. We identify three distinct repetition patterns: (1) business rule generation repetition, (2) method call relationship analysis repetition, and (3) PlantUML diagram syntax generation repetition. Through rigorous theoretical analysis based on Markov models, we establish that the root cause lies in greedy decoding's inability to escape repetitive loops, exacerbated by self-reinforcement effects. Our comprehensive experimental evaluation demonstrates three viable solutions: (1) Beam Search decoding with early_stopping=True serves as a universal post-hoc mechanism that effectively resolves all three repetition patterns; (2) presence_penalty hyperparameter provides an effective solution specifically for BadCase 1; and (3) Direct Preference Optimization (DPO) fine-tuning offers a universal model-level solution for all three BadCases. The primary value of this work lies in combining first-hand production experience with extensive experimental validation. Our main contributions include systematic theoretical analysis of repetition mechanisms, comprehensive evaluation of multiple solutions with task-specific applicability mapping, identification of early_stopping as the critical parameter for Beam Search effectiveness, and practical production-ready solutions validated in real deployment environments.

Solving LLM Repetition Problem in Production: A Comprehensive Study of Multiple Solutions

TL;DR

The paper investigates the repetition problem in production LLM deployments, focusing on batch code interpretation where greedy decoding induces looping and dramatic processing delays. It combines theoretical Markov-model analysis with empirical evaluation of three remedies: Beam Search with early_stopping as a universal inference-time solution, a task-specific presence_penalty, and a universal model-level Direct Preference Optimization (DPO) fine-tuning. The study demonstrates that early_stopping is the decisive parameter for Beam Search effectiveness, and that DPO can universally mitigate repetition across BadCase types, while presence_penalty effectively handles BadCase 1. The results translate into practical production guidance, including parameter configurations, framework integration notes, and recommended deployment practices to restore normal performance in production systems.

Abstract

The repetition problem, where Large Language Models (LLMs) continuously generate repetitive content without proper termination, poses a critical challenge in production deployments, causing severe performance degradation and system stalling. This paper presents a comprehensive investigation and multiple practical solutions for the repetition problem encountered in real-world batch code interpretation tasks. We identify three distinct repetition patterns: (1) business rule generation repetition, (2) method call relationship analysis repetition, and (3) PlantUML diagram syntax generation repetition. Through rigorous theoretical analysis based on Markov models, we establish that the root cause lies in greedy decoding's inability to escape repetitive loops, exacerbated by self-reinforcement effects. Our comprehensive experimental evaluation demonstrates three viable solutions: (1) Beam Search decoding with early_stopping=True serves as a universal post-hoc mechanism that effectively resolves all three repetition patterns; (2) presence_penalty hyperparameter provides an effective solution specifically for BadCase 1; and (3) Direct Preference Optimization (DPO) fine-tuning offers a universal model-level solution for all three BadCases. The primary value of this work lies in combining first-hand production experience with extensive experimental validation. Our main contributions include systematic theoretical analysis of repetition mechanisms, comprehensive evaluation of multiple solutions with task-specific applicability mapping, identification of early_stopping as the critical parameter for Beam Search effectiveness, and practical production-ready solutions validated in real deployment environments.

Paper Structure

This paper contains 74 sections, 7 equations, 2 figures, 10 tables.

Figures (2)

  • Figure 1: Transaction Processing Workflow: Sequential steps showing LLM involvement
  • Figure 2: Performance-Overhead Tradeoff: Time and Memory Overhead vs Beam Width

Theorems & Definitions (1)

  • proof