Sparse Overcomplete Word Vector Representations
Manaal Faruqui, Yulia Tsvetkov, Dani Yogatama, Chris Dyer, Noah Smith
TL;DR
The paper addresses the interpretability gap between dense word vectors and lexical semantic theories by introducing sparse overcomplete transformations that produce lengthy, sparse (and optionally binary) representations learned from raw corpora. It presents two methods—sparse coding (A) and nonnegative sparse coding with binarization (B)—and demonstrates via extensive benchmarks and a word intrusion study that these transformed vectors generally outperform the original vectors and are more interpretable. The approach relies on AdaGrad optimization, nonnegativity constraints, and a careful hyperparameter grid search to balance sparsity and performance. Overall, the work offers a principled pathway to obtain interpretable, task-robust word representations that can serve as discrete-style features for NLP models, with code released for public use.
Abstract
Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more similar to the interpretable features typically used in NLP, though they are discovered automatically from raw corpora. Because the vectors are highly sparse, they are computationally easy to work with. Most importantly, we find that they outperform the original vectors on benchmark tasks.
