Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong
TL;DR
DiffuSearch introduces a discrete diffusion-based approach to embed implicit, future-aware lookahead directly into a policy for chess, aiming to surpass both searchless and explicit-search baselines. By conditioning a Transformer policy on a multi-step future via a discrete diffusion process, it achieves higher action accuracy, puzzle-solving ability, and Elo than one-step and MCTS-enhanced policies, using Stockfish as the supervision oracle. Key contributions include a practical diffusion-based training objective, a horizon-aware future representation, and comprehensive ablations showing the importance of future dynamics, diffusion design, and self-attention depth. The results suggest implicit future modeling can rival explicit search in structured planning tasks, with potential applicability to long-horizon reasoning in language and decision-making systems.
Abstract
In the post-AlphaGo era, there has been a renewed interest in search techniques such as Monte Carlo Tree Search (MCTS), particularly in their application to Large Language Models (LLMs). This renewed attention is driven by the recognition that current next-token prediction models often lack the ability for long-term planning. Is it possible to instill search-like abilities within the models to enhance their planning abilities without relying on explicit search? We propose DiffuSearch , a model that does \textit{implicit search} by looking into the future world via discrete diffusion modeling. We instantiate DiffuSearch on a classical board game, Chess, where explicit search is known to be essential. Through extensive controlled experiments, we show DiffuSearch outperforms both the searchless and explicit search-enhanced policies. Specifically, DiffuSearch outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DiffuSearch demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment. These results indicate that implicit search via discrete diffusion is a viable alternative to explicit search over a one-step policy. All codes are publicly available at \href{https://github.com/HKUNLP/DiffuSearch}{https://github.com/HKUNLP/DiffuSearch}.
