Exploring Parity Challenges in Reinforcement Learning through Curriculum Learning with Noisy Labels
Bei Zhou, Soren Riis
TL;DR
The paper frames parity learning in impartial games as a bottleneck for reinforcement learning with self-play, focusing on how label noise interacts with curriculum-like exposure in bitstring representations of game states. It introduces a latent-curriculum approach and controlled noisy labels, analyzes across bitstrings up to length $n=100$, and uses a single-layer LSTM with binary cross-entropy to quantify learning dynamics, including a gradient-information bound $\mathrm{Var}(\mathcal{H}, F, \mathbf{w}) \le \frac{C}{2^n}$ under uniform data. Key findings show that learning deteriorates as bitstring length grows and that more than 5% noise on long bitstrings prevents parity modeling, while latent curricula can mitigate some difficulties but require substantial training. Overall, the work highlights practical implications for improving self-play RL in impartial games and provides a framework for evaluating resilience to noisy labels.
Abstract
This paper delves into applying reinforcement learning (RL) in strategy games, particularly those characterized by parity challenges, as seen in specific positions of Go and Chess and a broader range of impartial games. We propose a simulated learning process, structured within a curriculum learning framework and augmented with noisy labels, to mirror the intricacies of self-play learning scenarios. This approach thoroughly analyses how neural networks (NNs) adapt and evolve from elementary to increasingly complex game positions. Our empirical research indicates that even minimal label noise can significantly impede NNs' ability to discern effective strategies, a difficulty that intensifies with the growing complexity of the game positions. These findings underscore the urgent need for advanced methodologies in RL training, specifically tailored to counter the obstacles imposed by noisy evaluations. The development of such methodologies is crucial not only for enhancing NN proficiency in strategy games with significant parity elements but also for broadening the resilience and efficiency of RL systems across diverse and complex environments.
