Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning
Ming Chen, Sheng Tang, Rong-Xi Tan, Ziniu Li, Jiacheng Chen, Ke Xue, Chao Qian
TL;DR
The paper tackles the mismatch between token-level supervision and continuous targets in decoding-based regression. It introduces GenRe^2, an RL-based framework that treats decoding as an MDP and optimizes sequence-level rewards to enforce global numerical coherence, yielding consistent improvements over token-level baselines on both tabular and code-metric regression. Through exhaustive experiments and ablations, the authors show that sequence-level feedback enhances predictive precision and sampling efficiency, establishing decoding-based regression as a robust approach for numerical prediction. The work also analyzes RL dynamics, output distribution shaping, and the stability of GRPO versus ReMax, offering guidance for future developments in RL-enabled regression methods.
Abstract
Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. However, its progress is hindered by the misalignment between discrete token-level objectives (e.g., cross-entropy) and continuous numerical values. Existing approaches relying on token-level constraints often fail to capture the global magnitude of the target value, limiting their precision and generalization. In this paper, we propose to unlock the potential of decoding-based regression via Reinforcement Learning (RL). We formulate the generation process as a Markov Decision Process, utilizing sequence-level rewards to enforce global numerical coherence. Extensive experiments on tabular regression and code metric regression demonstrate that our method (specifically with ReMax and GRPO) consistently outperforms both state-of-the-art token-level baselines and traditional regression heads, showing the superiority of introducing sequence-level signals. Our analysis further reveals that RL significantly enhances sampling efficiency and predictive precision, establishing decoding-based regression as a robust and accurate paradigm for general-purpose numerical prediction.
