Table of Contents
Fetching ...

Implicit Search via Discrete Diffusion: A Study on Chess

Jiacheng Ye, Zhenyu Wu, Jiahui Gao, Zhiyong Wu, Xin Jiang, Zhenguo Li, Lingpeng Kong

TL;DR

DiffuSearch introduces a discrete diffusion-based approach to embed implicit, future-aware lookahead directly into a policy for chess, aiming to surpass both searchless and explicit-search baselines. By conditioning a Transformer policy on a multi-step future via a discrete diffusion process, it achieves higher action accuracy, puzzle-solving ability, and Elo than one-step and MCTS-enhanced policies, using Stockfish as the supervision oracle. Key contributions include a practical diffusion-based training objective, a horizon-aware future representation, and comprehensive ablations showing the importance of future dynamics, diffusion design, and self-attention depth. The results suggest implicit future modeling can rival explicit search in structured planning tasks, with potential applicability to long-horizon reasoning in language and decision-making systems.

Abstract

In the post-AlphaGo era, there has been a renewed interest in search techniques such as Monte Carlo Tree Search (MCTS), particularly in their application to Large Language Models (LLMs). This renewed attention is driven by the recognition that current next-token prediction models often lack the ability for long-term planning. Is it possible to instill search-like abilities within the models to enhance their planning abilities without relying on explicit search? We propose DiffuSearch , a model that does \textit{implicit search} by looking into the future world via discrete diffusion modeling. We instantiate DiffuSearch on a classical board game, Chess, where explicit search is known to be essential. Through extensive controlled experiments, we show DiffuSearch outperforms both the searchless and explicit search-enhanced policies. Specifically, DiffuSearch outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DiffuSearch demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment. These results indicate that implicit search via discrete diffusion is a viable alternative to explicit search over a one-step policy. All codes are publicly available at \href{https://github.com/HKUNLP/DiffuSearch}{https://github.com/HKUNLP/DiffuSearch}.

Implicit Search via Discrete Diffusion: A Study on Chess

TL;DR

DiffuSearch introduces a discrete diffusion-based approach to embed implicit, future-aware lookahead directly into a policy for chess, aiming to surpass both searchless and explicit-search baselines. By conditioning a Transformer policy on a multi-step future via a discrete diffusion process, it achieves higher action accuracy, puzzle-solving ability, and Elo than one-step and MCTS-enhanced policies, using Stockfish as the supervision oracle. Key contributions include a practical diffusion-based training objective, a horizon-aware future representation, and comprehensive ablations showing the importance of future dynamics, diffusion design, and self-attention depth. The results suggest implicit future modeling can rival explicit search in structured planning tasks, with potential applicability to long-horizon reasoning in language and decision-making systems.

Abstract

In the post-AlphaGo era, there has been a renewed interest in search techniques such as Monte Carlo Tree Search (MCTS), particularly in their application to Large Language Models (LLMs). This renewed attention is driven by the recognition that current next-token prediction models often lack the ability for long-term planning. Is it possible to instill search-like abilities within the models to enhance their planning abilities without relying on explicit search? We propose DiffuSearch , a model that does \textit{implicit search} by looking into the future world via discrete diffusion modeling. We instantiate DiffuSearch on a classical board game, Chess, where explicit search is known to be essential. Through extensive controlled experiments, we show DiffuSearch outperforms both the searchless and explicit search-enhanced policies. Specifically, DiffuSearch outperforms the one-step policy by 19.2% and the MCTS-enhanced policy by 14% on action accuracy. Furthermore, DiffuSearch demonstrates a notable 30% enhancement in puzzle-solving abilities compared to explicit search-based policies, along with a significant 540 Elo increase in game-playing strength assessment. These results indicate that implicit search via discrete diffusion is a viable alternative to explicit search over a one-step policy. All codes are publicly available at \href{https://github.com/HKUNLP/DiffuSearch}{https://github.com/HKUNLP/DiffuSearch}.

Paper Structure

This paper contains 48 sections, 15 equations, 7 figures, 9 tables, 2 algorithms.

Figures (7)

  • Figure 1: Comparison between explicit search via MCTS and implicit search via discrete diffusion. MCTS explicitly performs action selection, state evaluation, and value backup in an iterative manner before determining the next action to take (as detailed in Appendix \ref{['app:mcts']}), while discrete diffusion implicitly gathers future information during future imagination to improve the next action.
  • Figure 2: (Left) Prediction quality analysis for DiffuSearch at different future steps. (Middle) Action accuracy when scaling self-attention layers. (Right) Action accuracy when increasing diffusion timesteps.
  • Figure 3: (Left) Action accuracy when increasing average search depth in MCTS through more simulations and DiffuSearch through context length extension. (Middle) Latency measured by ms per second when increasing search depth. (Right) Action accuracy when scaling data size.
  • Figure 4: Two examples of Transformer (S-A) and DiffuSearch solving challenging puzzles. The predicted next move is in blue for both policies. The predicted future actions from DiffuSearch are in light blue and red representing the two players, respectively, along with the numerical counters 1, 2, and 3 indicating future steps.
  • Figure 5: Additional prediction cases on challenging puzzles.
  • ...and 2 more figures