Table of Contents
Fetching ...

Latent Planning via Embedding Arithmetic: A Contrastive Approach to Strategic Reasoning

Andrew Hamara, Greg Hamerly, Pablo Rivas, Andrew C. Freeman

TL;DR

SOLIS introduces an evaluation-aligned latent space learned via supervised contrastive learning to enable planning by vector arithmetic in chess. It avoids learning explicit dynamics or policy/value heads and demonstrates competitive strength with a shallow search against Stockfish, achieving Elo around 2500+ at depth $5$ with anchored scoring. The approach yields interpretable latent trajectories and suggests that latent planning in a compact embedding space can be a compute-efficient alternative for large-action, perfect-information domains, with potential extensions to other games and control tasks. The combination of a single global advantage direction $\vec{a}$ and embedding arithmetic enables fast planning while maintaining strong qualitative and quantitative performance.

Abstract

Planning in high-dimensional decision spaces is increasingly being studied through the lens of learned representations. Rather than training policies or value heads, we investigate whether planning can be carried out directly in an evaluation-aligned embedding space. We introduce SOLIS, which learns such a space using supervised contrastive learning. In this representation, outcome similarity is captured by proximity, and a single global advantage vector orients the space from losing to winning regions. Candidate actions are then ranked according to their alignment with this direction, reducing planning to vector operations in latent space. We demonstrate this approach in chess, where SOLIS uses only a shallow search guided by the learned embedding to reach competitive strength under constrained conditions. More broadly, our results suggest that evaluation-aligned latent planning offers a lightweight alternative to traditional dynamics models or policy learning.

Latent Planning via Embedding Arithmetic: A Contrastive Approach to Strategic Reasoning

TL;DR

SOLIS introduces an evaluation-aligned latent space learned via supervised contrastive learning to enable planning by vector arithmetic in chess. It avoids learning explicit dynamics or policy/value heads and demonstrates competitive strength with a shallow search against Stockfish, achieving Elo around 2500+ at depth with anchored scoring. The approach yields interpretable latent trajectories and suggests that latent planning in a compact embedding space can be a compute-efficient alternative for large-action, perfect-information domains, with potential extensions to other games and control tasks. The combination of a single global advantage direction and embedding arithmetic enables fast planning while maintaining strong qualitative and quantitative performance.

Abstract

Planning in high-dimensional decision spaces is increasingly being studied through the lens of learned representations. Rather than training policies or value heads, we investigate whether planning can be carried out directly in an evaluation-aligned embedding space. We introduce SOLIS, which learns such a space using supervised contrastive learning. In this representation, outcome similarity is captured by proximity, and a single global advantage vector orients the space from losing to winning regions. Candidate actions are then ranked according to their alignment with this direction, reducing planning to vector operations in latent space. We demonstrate this approach in chess, where SOLIS uses only a shallow search guided by the learned embedding to reach competitive strength under constrained conditions. More broadly, our results suggest that evaluation-aligned latent planning offers a lightweight alternative to traditional dynamics models or policy learning.

Paper Structure

This paper contains 35 sections, 6 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: UMAP projection of the learned embedding space of our Base model, colored by win probability (gold = White favored, green = Black favored).
  • Figure 2: System overview of SOLIS. Candidate moves are embedded and scored by their alignment with the global advantage vector. For simplicity, this diagram shows the unanchored scoring mechanism $\cos(z', \vec{a})$.
  • Figure 3: Elo calculations for our Mini and Base models at varying search widths and depths with the unanchored planning method. Shaded bands denote 95% confidence intervals.
  • Figure 4: Elo calculations for our Mini and Base models at varying search widths and depths with the anchored planning method. Shaded bands denote 95% confidence intervals.
  • Figure 5: Latent trajectory visualizations of three games embedded in the shared representation space. Red arrows indicate the progression of positions as the game unfolds.
  • ...and 4 more figures