StarSpace: Embed All The Things!
Ledell Wu, Adam Fisch, Sumit Chopra, Keith Adams, Antoine Bordes, Jason Weston
TL;DR
StarSpace presents a general neural embedding framework that learns entity representations from discrete feature bags and optimizes a ranking or classification objective across diverse tasks. By embedding heterogeneous entities into a shared space and training with $k$-negative sampling, StarSpace unifies text labeling, retrieval, collaborative filtering, and knowledge-graph embedding under a single objective. The authors demonstrate competitive performance across text classification, content-based recommendations, link prediction, Wikipedia search, sentence matching, and transferable sentence embeddings, often outperforming task-specific baselines. A key contribution is handling out-of-sample items and users through feature-based representations, enabling broad applicability and practical baselines for multiple downstream applications. The work suggests broad potential extensions to continuous features and multimodal data while maintaining the framework's generality.
Abstract
We present StarSpace, a general-purpose neural embedding model that can solve a wide variety of problems: labeling tasks such as text classification, ranking tasks such as information retrieval/web search, collaborative filtering-based or content-based recommendation, embedding of multi-relational graphs, and learning word, sentence or document level embeddings. In each case the model works by embedding those entities comprised of discrete features and comparing them against each other -- learning similarities dependent on the task. Empirical results on a number of tasks show that StarSpace is highly competitive with existing methods, whilst also being generally applicable to new cases where those methods are not.
