Table of Contents
Fetching ...

Monitoring the Evolution of Behavioural Embeddings in Social Media Recommendation

Srijan Saket, Olivier Jeunen, Md. Danish Kalim

TL;DR

This paper addresses the challenge of evolving item embeddings in fast-moving short-video platforms with continual content growth and limited feedback. It compares batch versus real-time embedding updates using a ShareChat Hindi dataset and 32-dimensional embeddings learned via Field-aware Factorization Machines, introducing a maturity criterion $1 - \cos({\rm emb}_t, {\rm emb}_{\rm converged}) < \alpha$. Key findings show real-time updates mature embeddings with only about 20% of the interactions required for batch, exhibit earlier information updates and faster convergence, and reduce norm-driven popularity bias (batch embeddings reach much higher $||\text{emb}||_2$ values, up to ~5x). The results inform production recommender design, suggesting real-time embeddings can improve early content targeting and user engagement while highlighting trade-offs in update frequency and bias, with practical implications for scaling and cost.

Abstract

Emerging short-video platforms like TikTok, Instagram Reels, and ShareChat present unique challenges for recommender systems, primarily originating from a continuous stream of new content. ShareChat alone receives approximately 2 million pieces of fresh content daily, complicating efforts to assess quality, learn effective latent representations, and accurately match content with the appropriate user base, especially given limited user feedback. Embedding-based approaches are a popular choice for industrial recommender systems because they can learn low-dimensional representations of items, leading to effective recommendation that can easily scale to millions of items and users. Our work characterizes the evolution of such embeddings in short-video recommendation systems, comparing the effect of batch and real-time updates to content embeddings. We investigate \emph{how} embeddings change with subsequent updates, explore the relationship between embeddings and popularity bias, and highlight their impact on user engagement metrics. Our study unveils the contrast in the number of interactions needed to achieve mature embeddings in a batch learning setup versus a real-time one, identifies the point of highest information updates, and explores the distribution of $\ell_2$-norms across the two competing learning modes. Utilizing a production system deployed on a large-scale short-video app with over 180 million users, our findings offer insights into designing effective recommendation systems and enhancing user satisfaction and engagement in short-video applications.

Monitoring the Evolution of Behavioural Embeddings in Social Media Recommendation

TL;DR

This paper addresses the challenge of evolving item embeddings in fast-moving short-video platforms with continual content growth and limited feedback. It compares batch versus real-time embedding updates using a ShareChat Hindi dataset and 32-dimensional embeddings learned via Field-aware Factorization Machines, introducing a maturity criterion . Key findings show real-time updates mature embeddings with only about 20% of the interactions required for batch, exhibit earlier information updates and faster convergence, and reduce norm-driven popularity bias (batch embeddings reach much higher values, up to ~5x). The results inform production recommender design, suggesting real-time embeddings can improve early content targeting and user engagement while highlighting trade-offs in update frequency and bias, with practical implications for scaling and cost.

Abstract

Emerging short-video platforms like TikTok, Instagram Reels, and ShareChat present unique challenges for recommender systems, primarily originating from a continuous stream of new content. ShareChat alone receives approximately 2 million pieces of fresh content daily, complicating efforts to assess quality, learn effective latent representations, and accurately match content with the appropriate user base, especially given limited user feedback. Embedding-based approaches are a popular choice for industrial recommender systems because they can learn low-dimensional representations of items, leading to effective recommendation that can easily scale to millions of items and users. Our work characterizes the evolution of such embeddings in short-video recommendation systems, comparing the effect of batch and real-time updates to content embeddings. We investigate \emph{how} embeddings change with subsequent updates, explore the relationship between embeddings and popularity bias, and highlight their impact on user engagement metrics. Our study unveils the contrast in the number of interactions needed to achieve mature embeddings in a batch learning setup versus a real-time one, identifies the point of highest information updates, and explores the distribution of -norms across the two competing learning modes. Utilizing a production system deployed on a large-scale short-video app with over 180 million users, our findings offer insights into designing effective recommendation systems and enhancing user satisfaction and engagement in short-video applications.
Paper Structure (7 sections, 2 equations, 6 figures, 1 table)

This paper contains 7 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: System comparison of real-time vs batch learning for content embeddings.
  • Figure 2: Comparison of item embeddings' maturity curves for batch & realtime. Realtime achieves convergence significantly faster than batch updates do.
  • Figure 3: Comparison of item embeddings' maturity curves for batch & realtime. Realtime achieves convergence significantly faster than batch updates do.
  • Figure 4: Video click rate and successful video play from experiments on the described dataset.
  • Figure 5: For batched updates, the norm ratio is higher for converged embeddings w.r.t. initial embeddings.
  • ...and 1 more figures