Closed-Form Training Dynamics Reveal Learned Features and Linear Structure in Word2Vec-like Models
Dhruva Karkada, James B. Simon, Yasaman Bahri, Michael R. DeWeese
TL;DR
The paper presents Quadratic Word Embedding Models (QWEM) as a tractable proxy for word2vec by analyzing the quartic Maclaurin approximation of the SGNS loss. It derives a closed-form gradient-flow solution showing that learning proceeds via sequential, rank-incrementing learning of orthogonal subspaces characterized by the top eigen-directions of the target ${\bm{M}}^{*}$, with explicit timescales ${\tau_k}$. Empirical validation on a Wikipedia corpus demonstrates that QWEMs reproduce word2vec dynamics, features, and downstream analogies, and the authors connect the formation of linear semantic representations to random-matrix theory (spiked models and Marchenko-Pastur spectra). The work provides a predictive, interpretable theory of feature learning in self-supervised language models and suggests that linear semantic structure emerges early as a consequence of the optimization dynamics and data statistics.
Abstract
Self-supervised word embedding algorithms such as word2vec provide a minimal setting for studying representation learning in language modeling. We examine the quartic Taylor approximation of the word2vec loss around the origin, and we show that both the resulting training dynamics and the final performance on downstream tasks are empirically very similar to those of word2vec. Our main contribution is to analytically solve for both the gradient flow training dynamics and the final word embeddings in terms of only the corpus statistics and training hyperparameters. The solutions reveal that these models learn orthogonal linear subspaces one at a time, each one incrementing the effective rank of the embeddings until model capacity is saturated. Training on Wikipedia, we find that each of the top linear subspaces represents an interpretable topic-level concept. Finally, we apply our theory to describe how linear representations of more abstract semantic concepts emerge during training; these can be used to complete analogies via vector addition.
