Monitoring the Evolution of Behavioural Embeddings in Social Media Recommendation
Srijan Saket, Olivier Jeunen, Md. Danish Kalim
TL;DR
This paper addresses the challenge of evolving item embeddings in fast-moving short-video platforms with continual content growth and limited feedback. It compares batch versus real-time embedding updates using a ShareChat Hindi dataset and 32-dimensional embeddings learned via Field-aware Factorization Machines, introducing a maturity criterion $1 - \cos({\rm emb}_t, {\rm emb}_{\rm converged}) < \alpha$. Key findings show real-time updates mature embeddings with only about 20% of the interactions required for batch, exhibit earlier information updates and faster convergence, and reduce norm-driven popularity bias (batch embeddings reach much higher $||\text{emb}||_2$ values, up to ~5x). The results inform production recommender design, suggesting real-time embeddings can improve early content targeting and user engagement while highlighting trade-offs in update frequency and bias, with practical implications for scaling and cost.
Abstract
Emerging short-video platforms like TikTok, Instagram Reels, and ShareChat present unique challenges for recommender systems, primarily originating from a continuous stream of new content. ShareChat alone receives approximately 2 million pieces of fresh content daily, complicating efforts to assess quality, learn effective latent representations, and accurately match content with the appropriate user base, especially given limited user feedback. Embedding-based approaches are a popular choice for industrial recommender systems because they can learn low-dimensional representations of items, leading to effective recommendation that can easily scale to millions of items and users. Our work characterizes the evolution of such embeddings in short-video recommendation systems, comparing the effect of batch and real-time updates to content embeddings. We investigate \emph{how} embeddings change with subsequent updates, explore the relationship between embeddings and popularity bias, and highlight their impact on user engagement metrics. Our study unveils the contrast in the number of interactions needed to achieve mature embeddings in a batch learning setup versus a real-time one, identifies the point of highest information updates, and explores the distribution of $\ell_2$-norms across the two competing learning modes. Utilizing a production system deployed on a large-scale short-video app with over 180 million users, our findings offer insights into designing effective recommendation systems and enhancing user satisfaction and engagement in short-video applications.
