Solving Rubik's Cube Without Tricky Sampling

Yicheng Lin; Siyu Liang

Solving Rubik's Cube Without Tricky Sampling

Yicheng Lin, Siyu Liang

TL;DR

The paper tackles sparse-reward reinforcement learning for the Rubik’s Cube by learning directly from fully scrambled states, avoiding near-solved-state sampling and search. It introduces a policy-gradient framework centered on ChaseNet, a neural predictor of state-pair costs, integrated into NX, Env, and Actor modules to guide learning from disordered configurations. On the 2x2x2 cube, the approach achieves over 99.4% success across 50,000 scrambled trials without tree search, demonstrating strong performance in a challenging sparse-reward setting. These results suggest a promising direction for generalized sparse-reward problems and motivate scaling to larger puzzles and diverse domains.

Abstract

The Rubiks Cube, with its vast state space and sparse reward structure, presents a significant challenge for reinforcement learning (RL) due to the difficulty of reaching rewarded states. Previous research addressed this by propagating cost-to-go estimates from the solved state and incorporating search techniques. These approaches differ from human strategies that start from fully scrambled cubes, which can be tricky for solving a general sparse-reward problem. In this paper, we introduce a novel RL algorithm using policy gradient methods to solve the Rubiks Cube without relying on near solved-state sampling. Our approach employs a neural network to predict cost patterns between states, allowing the agent to learn directly from scrambled states. Our method was tested on the 2x2x2 Rubiks Cube, where the cube was scrambled 50,000 times, and the model successfully solved it in over 99.4% of cases. Notably, this result was achieved using only the policy network without relying on tree search as in previous methods, demonstrating its effectiveness and potential for broader applications in sparse-reward problems.

Solving Rubik's Cube Without Tricky Sampling

TL;DR

Abstract

Solving Rubik's Cube Without Tricky Sampling

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)