Do Mice Grok? Glimpses of Hidden Progress During Overtraining in Sensory Cortex
Tanishq Kumar, Blake Bordelon, Cengiz Pehlevan, Venkatesh N. Murthy, Samuel J. Gershman
TL;DR
The study investigates whether task-specific cortical representations continue to refine after behavior saturates in a mouse odor-discrimination task. By reanalyzing posterior piriform cortex data, it shows continued separation of target and non-target representations and increasing classifier margins during overtraining, predicting improved generalization to held-out odors. A synthetic model and a biologically plausible variant reproduce grokking-like late-time learning and link margin maximization to observed neural dynamics, offering a mechanistic explanation for overtraining reversal. The work suggests late-time feature learning in sensory cortex and draws parallels to deep networks, with implications for understanding generalization and robustness under distribution shifts.
Abstract
Does learning of task-relevant representations stop when behavior stops changing? Motivated by recent theoretical advances in machine learning and the intuitive observation that human experts continue to learn from practice even after mastery, we hypothesize that task-specific representation learning can continue, even when behavior plateaus. In a novel reanalysis of recently published neural data, we find evidence for such learning in posterior piriform cortex of mice following continued training on a task, long after behavior saturates at near-ceiling performance ("overtraining"). This learning is marked by an increase in decoding accuracy from piriform neural populations and improved performance on held-out generalization tests. We demonstrate that class representations in cortex continue to separate during overtraining, so that examples that were incorrectly classified at the beginning of overtraining can abruptly be correctly classified later on, despite no changes in behavior during that time. We hypothesize this hidden yet rich learning takes the form of approximate margin maximization; we validate this and other predictions in the neural data, as well as build and interpret a simple synthetic model that recapitulates these phenomena. We conclude by showing how this model of late-time feature learning implies an explanation for the empirical puzzle of overtraining reversal in animal learning, where task-specific representations are more robust to particular task changes because the learned features can be reused.
