Table of Contents
Fetching ...

Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly

Wenya Xie, Shaochen, Zhong, Hoang Anh Duy Le, Zhaozhuo Xu, Jianwen Xie, Zirui Liu

TL;DR

This work identifies extensive token waste in large reasoning models caused by word-salad self-repetitions during long thinking. It introduces WordSaladChopper, a lightweight, on-the-fly intervention that detects word-salad chunks with a single-layer linear classifier, pragmatically chops the reasoning trajectory at a defined point, and regenerates within a fixed budget to preserve output quality. Empirical results show substantial reductions in useless tokens with minimal impact on accuracy and negligible latency overhead, suggesting WSC as a practical, turnkey addition for LRM deployments. Overall, the approach offers a minimally invasive path to improve decoding efficiency in reasoning tasks, with clear directions for future refinements and broader applicability across model-task pairs.

Abstract

Large Reasoning Models (LRMs) are often bottlenecked by the high cost of output tokens. We show that a significant portion of these tokens are useless self-repetitions - what we call "word salad" - that exhaust the decoding budget without adding value. Interestingly, we observe that LRMs are self-aware when trapped in these loops: the hidden states of <\n\n> tokens trailing each reasoning chunk exhibit patterns that allow us to detect word salad behavior on-the-fly via a single-layer linear classifier. Once detected, a simple chop appended by a straightforward regeneration prompt yields substantial length savings with minimal quality loss. Our work offers WordSaladChopper (WSC) - a lightweight, turnkey component for LRM that is minimally invasive to its reasoning trajectory by only removing semantically redundant tokens. Given its low overhead, strong savings, and the lack of semantic value of word salad tokens, we believe it is not too far-fetched to argue that WSC - or a similar component - is a must-have for all LRM applications with user experience in mind. Our code is publicly available at https://github.com/wenyaxie023/WordSaladChopper.

Word Salad Chopper: Reasoning Models Waste A Ton Of Decoding Budget On Useless Repetitions, Self-Knowingly

TL;DR

This work identifies extensive token waste in large reasoning models caused by word-salad self-repetitions during long thinking. It introduces WordSaladChopper, a lightweight, on-the-fly intervention that detects word-salad chunks with a single-layer linear classifier, pragmatically chops the reasoning trajectory at a defined point, and regenerates within a fixed budget to preserve output quality. Empirical results show substantial reductions in useless tokens with minimal impact on accuracy and negligible latency overhead, suggesting WSC as a practical, turnkey addition for LRM deployments. Overall, the approach offers a minimally invasive path to improve decoding efficiency in reasoning tasks, with clear directions for future refinements and broader applicability across model-task pairs.

Abstract

Large Reasoning Models (LRMs) are often bottlenecked by the high cost of output tokens. We show that a significant portion of these tokens are useless self-repetitions - what we call "word salad" - that exhaust the decoding budget without adding value. Interestingly, we observe that LRMs are self-aware when trapped in these loops: the hidden states of <\n\n> tokens trailing each reasoning chunk exhibit patterns that allow us to detect word salad behavior on-the-fly via a single-layer linear classifier. Once detected, a simple chop appended by a straightforward regeneration prompt yields substantial length savings with minimal quality loss. Our work offers WordSaladChopper (WSC) - a lightweight, turnkey component for LRM that is minimally invasive to its reasoning trajectory by only removing semantically redundant tokens. Given its low overhead, strong savings, and the lack of semantic value of word salad tokens, we believe it is not too far-fetched to argue that WSC - or a similar component - is a must-have for all LRM applications with user experience in mind. Our code is publicly available at https://github.com/wenyaxie023/WordSaladChopper.

Paper Structure

This paper contains 45 sections, 2 equations, 1 figure, 12 tables, 1 algorithm.

Figures (1)

  • Figure 1: General workflow of WordSaladChopper. 1) Detect: We allow the reasoning model to freely generate, following its original reasoning flow. Meanwhile, we classify the hidden state of each chunk's trailing <\\ n\\ n> token using our trained linear classifier in an on-the-fly manner; 2) Chop: Once a chopping point is reached — in this case, it is defined by having two consecutive word salad chunks detected — we truncate the generation to the left of it; 3) Regenerate: We append a regeneration prompt with constant budget, allowing the model to complete its answer by its own via <eos> or until the new budget is fully expensed.