Table of Contents
Fetching ...

InjectRBP: Steering Large Language Model Reasoning Behavior via Pattern Injection

Xiuping Wu, Zhao Yu, Yuxin Cheng, Ngai Wong, Liangjun Ke, Tapas Mishra, Konstantinos V. Katsikopoulos

TL;DR

InjectRBP demonstrates that LLM reasoning can be steered by injecting learned behavior patterns without updating model parameters. By modeling reasoning as sequences over a finite behavior alphabet and employing 3-gram injections, the authors design InjectCorrect (self-imitation from correct traces) and InjectRLOpt (MDP-based optimization with a Reliability-Aware Softmax Policy). Across GPQA, MATH, AIME25, and MBPP, these methods yield measurable gains, with InjectRLOpt often outperforming InjectCorrect and a recommended default discount factor $\gamma$ of $0.98$ for larger models. The approach highlights the significance of domain-aligned reasoning motifs and opens avenues for post-hoc reasoning enhancement without architectural changes.

Abstract

Reasoning can significantly enhance the performance of Large Language Models. While recent studies have exploited behavior-related prompts adjustment to enhance reasoning, these designs remain largely intuitive and lack a systematic analysis of the underlying behavioral patterns. Motivated by this, we investigate how models' reasoning behaviors shape reasoning from the perspective of behavioral patterns. We observe that models exhibit adaptive distributions of reasoning behaviors when responding to specific types of questions, and that structurally injecting these patterns can substantially influence the quality of the models' reasoning processes and outcomes. Building on these findings, we propose two optimization methods that require no parameter updates: InjectCorrect and InjectRLOpt. InjectCorrect guides the model by imitating behavioral patterns derived from its own past correct answers. InjectRLOpt learns a value function from historical behavior-pattern data and, via our proposed Reliability-Aware Softmax Policy, generates behavioral injectant during inference to steer the reasoning process. Our experiments demonstrate that both methods can improve model performance across various reasoning tasks without requiring any modifications to model parameters, achieving gains of up to 5.34% and 8.67%, respectively.

InjectRBP: Steering Large Language Model Reasoning Behavior via Pattern Injection

TL;DR

InjectRBP demonstrates that LLM reasoning can be steered by injecting learned behavior patterns without updating model parameters. By modeling reasoning as sequences over a finite behavior alphabet and employing 3-gram injections, the authors design InjectCorrect (self-imitation from correct traces) and InjectRLOpt (MDP-based optimization with a Reliability-Aware Softmax Policy). Across GPQA, MATH, AIME25, and MBPP, these methods yield measurable gains, with InjectRLOpt often outperforming InjectCorrect and a recommended default discount factor of for larger models. The approach highlights the significance of domain-aligned reasoning motifs and opens avenues for post-hoc reasoning enhancement without architectural changes.

Abstract

Reasoning can significantly enhance the performance of Large Language Models. While recent studies have exploited behavior-related prompts adjustment to enhance reasoning, these designs remain largely intuitive and lack a systematic analysis of the underlying behavioral patterns. Motivated by this, we investigate how models' reasoning behaviors shape reasoning from the perspective of behavioral patterns. We observe that models exhibit adaptive distributions of reasoning behaviors when responding to specific types of questions, and that structurally injecting these patterns can substantially influence the quality of the models' reasoning processes and outcomes. Building on these findings, we propose two optimization methods that require no parameter updates: InjectCorrect and InjectRLOpt. InjectCorrect guides the model by imitating behavioral patterns derived from its own past correct answers. InjectRLOpt learns a value function from historical behavior-pattern data and, via our proposed Reliability-Aware Softmax Policy, generates behavioral injectant during inference to steer the reasoning process. Our experiments demonstrate that both methods can improve model performance across various reasoning tasks without requiring any modifications to model parameters, achieving gains of up to 5.34% and 8.67%, respectively.
Paper Structure (16 sections, 19 equations, 2 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 19 equations, 2 figures, 9 tables, 1 algorithm.

Figures (2)

  • Figure 1: The analysis process of reasoning behavior pattern, including how the reasoning processes are decomposed from reasoning text into a structured “Reasoning Behavior Chain” and “Reasoning DNA”, and how frequent n‑gram behavior patterns are clipped and structurally injected into models to enhance their systematic reasoning ability.
  • Figure 2: Normalized frequencies of the top‑10 reasoning behavior patterns of Qwen3‑8B on the GPQA dataset. For each pattern, its frequency is shown for total, correct, incorrect, short‑reasoning, and long‑reasoning samples. Within each sample subset, frequencies are normalized to sum to 1.