Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
Hojoon Lee, Hyeonseo Cho, Hyunseung Kim, Donghu Kim, Dugki Min, Jaegul Choo, Clare Lyle
TL;DR
The paper investigates why neural networks lose generalization when trained with warm-starting and shows that improving trainability alone does not restore generalization in modern architectures. It introduces Hare & Tortoise, a dual-network system where a fast-learning Hare is periodically reset to a slow-moving Tortoise via an exponential moving average, thereby decoupling plasticity from knowledge retention. The method improves generalization in warm-start, continual learning, and RL settings (e.g., Atari-100k), often outperforming reinitialization-based baselines and standard regularizers. This approach offers a practical route to maintain plasticity without erasing valuable prior knowledge, with implications for data-efficient learning in large-scale models.
Abstract
This study investigates the loss of generalization ability in neural networks, revisiting warm-starting experiments from Ash & Adams. Our empirical analysis reveals that common methods designed to enhance plasticity by maintaining trainability provide limited benefits to generalization. While reinitializing the network can be effective, it also risks losing valuable prior knowledge. To this end, we introduce the Hare & Tortoise, inspired by the brain's complementary learning system. Hare & Tortoise consists of two components: the Hare network, which rapidly adapts to new information analogously to the hippocampus, and the Tortoise network, which gradually integrates knowledge akin to the neocortex. By periodically reinitializing the Hare network to the Tortoise's weights, our method preserves plasticity while retaining general knowledge. Hare & Tortoise can effectively maintain the network's ability to generalize, which improves advanced reinforcement learning algorithms on the Atari-100k benchmark. The code is available at https://github.com/dojeon-ai/hare-tortoise.
