Table of Contents
Fetching ...

DISC: Dynamic Decomposition Improves LLM Inference Scaling

Jonathan Light, Wei Cheng, Benjamin Riviere, Wu Yue, Masafumi Oyamada, Mengdi Wang, Yisong Yue, Santiago Paternain, Haifeng Chen

TL;DR

DISC addresses the inefficiency of static step sizes in LLM inference by introducing a dynamic decomposition that adaptively partitions solution steps based on real-time reward statistics. It combines a z-score based acceptance criterion with an adaptive prefix refinement to concentrate sampling on difficult regions, and is designed to be plug-and-play with greedy, beam, and Monte Carlo Tree Search. Empirically, DISC delivers consistent improvements in pass@k and token efficiency across APPS, MATH, and LiveCodeBench, including strong gains with open-source models and reasoning prompts, while maintaining negligible runtime overhead. The framework relies on minimal assumptions, requires only a scalar reward signal, and offers theoretical intuition about optimality under certain policy conditions, alongside practical analyses on temperature, partition fraction, and acceptance criteria. Overall, DISC provides a scalable, general approach to adaptive inference that can inform curriculum design, dataset augmentation, and future research in efficient reasoning for LLMs.

Abstract

Inference scaling methods for LLMs often rely on decomposing problems into steps (or groups of tokens), followed by sampling and selecting the best next steps. However, these steps and their sizes are often predetermined or manually designed based on domain knowledge. We propose dynamic decomposition, a method that adaptively and automatically partitions solution and reasoning traces into manageable steps during inference. By more effectively allocating compute -- particularly through subdividing challenging steps and prioritizing their sampling -- dynamic decomposition significantly improves inference efficiency. Experiments on benchmarks such as APPS, MATH, and LiveCodeBench demonstrate that dynamic decomposition outperforms static approaches, including token-level, sentence-level, and single-step decompositions, reducing the pass@10 error rate by 5.0%, 6.7%, and 10.5% respectively. These findings highlight the potential of dynamic decomposition to improve a wide range of inference scaling techniques.

DISC: Dynamic Decomposition Improves LLM Inference Scaling

TL;DR

DISC addresses the inefficiency of static step sizes in LLM inference by introducing a dynamic decomposition that adaptively partitions solution steps based on real-time reward statistics. It combines a z-score based acceptance criterion with an adaptive prefix refinement to concentrate sampling on difficult regions, and is designed to be plug-and-play with greedy, beam, and Monte Carlo Tree Search. Empirically, DISC delivers consistent improvements in pass@k and token efficiency across APPS, MATH, and LiveCodeBench, including strong gains with open-source models and reasoning prompts, while maintaining negligible runtime overhead. The framework relies on minimal assumptions, requires only a scalar reward signal, and offers theoretical intuition about optimality under certain policy conditions, alongside practical analyses on temperature, partition fraction, and acceptance criteria. Overall, DISC provides a scalable, general approach to adaptive inference that can inform curriculum design, dataset augmentation, and future research in efficient reasoning for LLMs.

Abstract

Inference scaling methods for LLMs often rely on decomposing problems into steps (or groups of tokens), followed by sampling and selecting the best next steps. However, these steps and their sizes are often predetermined or manually designed based on domain knowledge. We propose dynamic decomposition, a method that adaptively and automatically partitions solution and reasoning traces into manageable steps during inference. By more effectively allocating compute -- particularly through subdividing challenging steps and prioritizing their sampling -- dynamic decomposition significantly improves inference efficiency. Experiments on benchmarks such as APPS, MATH, and LiveCodeBench demonstrate that dynamic decomposition outperforms static approaches, including token-level, sentence-level, and single-step decompositions, reducing the pass@10 error rate by 5.0%, 6.7%, and 10.5% respectively. These findings highlight the potential of dynamic decomposition to improve a wide range of inference scaling techniques.

Paper Structure

This paper contains 71 sections, 3 theorems, 25 equations, 48 figures, 3 algorithms.

Key Result

Theorem 1

Consider Alg. alg:discwithgreedy. Suppose that for some problem $\boldsymbol{x}$, the optimal solution is in the support of $\pi(\cdot \mid \boldsymbol{x})$. Then at some $n > 0$, the base prefix contains EOS token, the algorithm terminates, and this solution is an optimal solution. See App. sec:th

Figures (48)

  • Figure 1: Comparison of automatic decomposition strategies based on step size. Coarser steps accelerate the search process but risk skipping over optimal solutions and committing to suboptimal prefixes. In contrast, finer steps ensure more precise decisions but lead to slower search. A dynamic strategy that adapts step size based on LLM feedback offers a balanced approach, combining the efficiency of coarse steps with the precision of fine-grained decomposition.
  • Figure 2: Multiple iterations of Alg. \ref{['alg:discwithgreedy']}.DISC dynamically refines its step sizes across iterations, advancing and contracting the prefix at which it samples from.
  • Figure 3: DISC with Greedy Search. One iteration of Alg. \ref{['alg:discwithgreedy']}. We start with a base prefix A and a candidate prefix B. We compare the sample statistics of each by evaluating a scoring function (e.g., z-score). If the candidate prefix B demonstrates a higher likelihood of reward improvement compared to continuing from base prefix A, we accept B, commit it as the new base, and extend the candidate to a further step (e.g., BD) on the best sampled solution. If B is not better, we reject it and propose a shorter candidate (e.g., AC), contracting the step size. This process repeats until a new candidate is accepted or all options are exhausted. The algorithm thus adaptively advances or contracts the step size and search horizon based on the relative quality of completions from each prefix.
  • Figure 4: Reward distribution of rollouts sampled from two different prefixes. The probability of sampling a higher rollout from prefix 2 is higher than that of prefix 1.
  • Figure 5: Token-level comparisons across benchmarks using gpt-4o-mini. (Left) APPS competition level (Middle) MATH500 (Right) LiveCodeBench. DISC achieves superior inference scaling over baselines on all three benchmarks.
  • ...and 43 more figures

Theorems & Definitions (11)

  • Theorem 1: Optimality of DISC
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Lemma 1
  • Proof 1
  • Theorem 2: Optimality of DISC
  • Proof 2
  • ...and 1 more