Table of Contents
Fetching ...

CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling

Chongyu Fan, Yihua Zhang, Jinghan Jia, Alfred Hero, Sijia Liu

TL;DR

This work proposes cyclical reflection token scheduling (termed CyclicReflex), a training-free decoding strategy that dynamically modulates reflection token logits with a bidirectional, position-dependent triangular waveform, incurring no additional computation cost.

Abstract

Large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, harness test-time scaling to perform multi-step reasoning for complex problem-solving. This reasoning process, executed before producing final answers, is often guided by special juncture tokens that prompt self-evaluative reflection. These transition markers and reflective cues are referred to as "reflection tokens" (e.g., "wait", "but", "alternatively"). In this work, we treat reflection tokens as a "resource" and introduce the problem of resource allocation, aimed at improving the test-time compute performance of LRMs by adaptively regulating the frequency and placement of reflection tokens. Through empirical analysis, we show that both excessive and insufficient use of reflection tokens, referred to as over-reflection and under-reflection, can degrade model performance. To better understand this trade-off, we draw an analogy between reflection token usage and learning rate scheduling in optimization. Building on this insight, We propose cyclical reflection token scheduling (termed CyclicReflex), a training-free decoding strategy that dynamically modulates reflection token logits with a bidirectional, position-dependent triangular waveform, incurring no additional computation cost. Experiments on MATH500, AIME2024/2025, AMC2023, GPQA Diamond and LiveCodeBench demonstrate that CyclicReflex consistently improves performance across model sizes (1.5B-14B), outperforming standard decoding and recent approaches such as TIP (thought switching penalty) and S1. Codes are available at https://github.com/OPTML-Group/CyclicReflex.

CyclicReflex: Improving Reasoning Models via Cyclical Reflection Token Scheduling

TL;DR

This work proposes cyclical reflection token scheduling (termed CyclicReflex), a training-free decoding strategy that dynamically modulates reflection token logits with a bidirectional, position-dependent triangular waveform, incurring no additional computation cost.

Abstract

Large reasoning models (LRMs), such as OpenAI's o1 and DeepSeek-R1, harness test-time scaling to perform multi-step reasoning for complex problem-solving. This reasoning process, executed before producing final answers, is often guided by special juncture tokens that prompt self-evaluative reflection. These transition markers and reflective cues are referred to as "reflection tokens" (e.g., "wait", "but", "alternatively"). In this work, we treat reflection tokens as a "resource" and introduce the problem of resource allocation, aimed at improving the test-time compute performance of LRMs by adaptively regulating the frequency and placement of reflection tokens. Through empirical analysis, we show that both excessive and insufficient use of reflection tokens, referred to as over-reflection and under-reflection, can degrade model performance. To better understand this trade-off, we draw an analogy between reflection token usage and learning rate scheduling in optimization. Building on this insight, We propose cyclical reflection token scheduling (termed CyclicReflex), a training-free decoding strategy that dynamically modulates reflection token logits with a bidirectional, position-dependent triangular waveform, incurring no additional computation cost. Experiments on MATH500, AIME2024/2025, AMC2023, GPQA Diamond and LiveCodeBench demonstrate that CyclicReflex consistently improves performance across model sizes (1.5B-14B), outperforming standard decoding and recent approaches such as TIP (thought switching penalty) and S1. Codes are available at https://github.com/OPTML-Group/CyclicReflex.

Paper Structure

This paper contains 22 sections, 3 equations, 12 figures, 9 tables.

Figures (12)

  • Figure 1: Schematic overview of our proposed method (CyclicReflex). The rightmost subfigure presents a comparison of final answer accuracy between CyclicReflex, the original LRM (DeepSeek-R1-Distill-Llama-8B), and decoding variants using TIP wang2025thoughts and S1 muennighoff2025s1.
  • Figure 2: (a) Answers from DeepSeek-R1-Distill-Qwen-7B on MATH500 clustered into Easy, Medium, and Hard using K-means over reflection word count and generation length. Each point represents one answer. (b) Accuracy of original decoding and TIP across difficulty levels. (c) Generation examples of original decoding and TIP for a problem from the Medium category.
  • Figure 3: Examples of landscape of thought for under-reflection, desired-reflection, and over-reflection, generated by DeepSeek-R1-Distill-Qwen-7B with the original decoding strategy. Each point represents a reasoning step and is connected in the order of generation. Darker regions indicate steps with higher semantic alignment to the correct answer.
  • Figure 4: Illustration of CyclicReflex (\ref{['eq:cyclic']}), where $t$ denotes the token position and $\delta(t)$ the logit adjustment on reflection tokens, oscillating between $-A$ and $A$ with amplitude $A$ and period $C$.
  • Figure 5: Accuracy vs. generation length on (a) GPQA Diamond and (b) LiveCodeBench. The comparison includes the original decoding, TIP, and CyclicReflex on DeepSeek-R1-Distill-Qwen 1.5B/7B, and Llama 8B.
  • ...and 7 more figures