Improving Latent Reasoning in LLMs via Soft Concept Mixing
Kang Wang, Xiangyu Duan, Tianyi Du
TL;DR
This work addresses the mismatch between discrete token reasoning and continuous latent concepts in LLMs by introducing Soft Concept Mixing (SCM). SCM generates a soft concept vector $\widetilde{\boldsymbol{se}}_t = \sum_i p_{t,i} \boldsymbol{e}(x_i)$ from the output distribution and injects it into the hidden state via $\boldsymbol{h}'_t = \boldsymbol{h}_t + \widetilde{\boldsymbol{se}}_t$, enabling latent reasoning in training. The policy is optimized with Group Relative Policy Optimization (GRPO) using multiple rollouts and a composite reward, providing stable gradients without a separate value function. Across GSM8K, MATH500, AIME 2024, GPQA-Diamond, and MMLU, SCM outperforms CoT, Soft Thinking, and GRPO baselines while displaying minimal latent-space drift in PCA analyses, demonstrating both enhanced reasoning and representational stability.
Abstract
Unlike human reasoning in abstract conceptual spaces, large language models (LLMs) typically reason by generating discrete tokens, which potentially limit their expressive power. The recent work Soft Thinking has shown that LLMs' latent reasoning via soft concepts is a promising direction, but LLMs are trained on discrete tokens. To reduce this gap between the soft concepts in reasoning and the discrete tokens in training, we propose Soft Concept Mixing (SCM), a soft concept aware training scheme that directly exposes the model to soft representations during training. Specifically, SCM constructs a soft concept vector by forming a probability-weighted average of embeddings. Then, this vector is mixed into the model's hidden states, which embody rich contextual information. Finally, the entire latent reasoning process is optimized with Reinforcement Learning (RL). Experiments on five reasoning benchmarks demonstrate that SCM improves the reasoning performance of LLMs, and simultaneously maintains a stable training dynamic.
