Table of Contents
Fetching ...

Trinity: Syncretizing Multi-/Long-tail/Long-term Interests All in One

Jing Yan, Liu Jiang, Jianfei Cui, Zhichen Zhao, Xingyan Bin, Feng Zhang, Zuotao Liu

TL;DR

The paper addresses interest amnesia in recommender systems by introducing Trinity, a statistics-based retrieval framework that unifies multi-/long-tail/long-term interests through long-term cues. It builds a two-level clustering system ($J=128$, $K=1024$) and long-term behavior histograms via a SIM head and $VQ$-$VAE$, enabling time-variant, scalable clustering that discretizes items into enumerable clusters. Three specialized retrievers—Trinity-M, Trinity-LT, and Trinity-L—target multi-, long-tail, and long-term interests, with carefully crafted strategies and sampling to balance diversity, coverage, and relevance; a re-rank stage refines candidates. Deployed on Douyin and Douyin Lite, Trinity demonstrates improved user experience (e.g., increases in $AAD$ and $AAH$) and resilience to emerging hot topics, while maintaining modest computational overhead. The work provides practical, generalizable guidance for industrial RS to mitigate interest amnesia by leveraging long-term statistics in retrieval.

Abstract

Interest modeling in recommender system has been a constant topic for improving user experience, and typical interest modeling tasks (e.g. multi-interest, long-tail interest and long-term interest) have been investigated in many existing works. However, most of them only consider one interest in isolation, while neglecting their interrelationships. In this paper, we argue that these tasks suffer from a common "interest amnesia" problem, and a solution exists to mitigate it simultaneously. We figure that long-term cues can be the cornerstone since they reveal multi-interest and clarify long-tail interest. Inspired by the observation, we propose a novel and unified framework in the retrieval stage, "Trinity", to solve interest amnesia problem and improve multiple interest modeling tasks. We construct a real-time clustering system that enables us to project items into enumerable clusters, and calculate statistical interest histograms over these clusters. Based on these histograms, Trinity recognizes underdelivered themes and remains stable when facing emerging hot topics. Trinity is more appropriate for large-scale industry scenarios because of its modest computational overheads. Its derived retrievers have been deployed on the recommender system of Douyin, significantly improving user experience and retention. We believe that such practical experience can be well generalized to other scenarios.

Trinity: Syncretizing Multi-/Long-tail/Long-term Interests All in One

TL;DR

The paper addresses interest amnesia in recommender systems by introducing Trinity, a statistics-based retrieval framework that unifies multi-/long-tail/long-term interests through long-term cues. It builds a two-level clustering system (, ) and long-term behavior histograms via a SIM head and -, enabling time-variant, scalable clustering that discretizes items into enumerable clusters. Three specialized retrievers—Trinity-M, Trinity-LT, and Trinity-L—target multi-, long-tail, and long-term interests, with carefully crafted strategies and sampling to balance diversity, coverage, and relevance; a re-rank stage refines candidates. Deployed on Douyin and Douyin Lite, Trinity demonstrates improved user experience (e.g., increases in and ) and resilience to emerging hot topics, while maintaining modest computational overhead. The work provides practical, generalizable guidance for industrial RS to mitigate interest amnesia by leveraging long-term statistics in retrieval.

Abstract

Interest modeling in recommender system has been a constant topic for improving user experience, and typical interest modeling tasks (e.g. multi-interest, long-tail interest and long-term interest) have been investigated in many existing works. However, most of them only consider one interest in isolation, while neglecting their interrelationships. In this paper, we argue that these tasks suffer from a common "interest amnesia" problem, and a solution exists to mitigate it simultaneously. We figure that long-term cues can be the cornerstone since they reveal multi-interest and clarify long-tail interest. Inspired by the observation, we propose a novel and unified framework in the retrieval stage, "Trinity", to solve interest amnesia problem and improve multiple interest modeling tasks. We construct a real-time clustering system that enables us to project items into enumerable clusters, and calculate statistical interest histograms over these clusters. Based on these histograms, Trinity recognizes underdelivered themes and remains stable when facing emerging hot topics. Trinity is more appropriate for large-scale industry scenarios because of its modest computational overheads. Its derived retrievers have been deployed on the recommender system of Douyin, significantly improving user experience and retention. We believe that such practical experience can be well generalized to other scenarios.
Paper Structure (15 sections, 3 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 3 equations, 6 figures, 3 tables, 2 algorithms.

Figures (6)

  • Figure 1: The interest reciprocity relationship and inspiration of Trinity. Multi-/long-term/long-tail interests are mutually dependent and reinforcing. See text for detailed examples.
  • Figure 2: The framework of the proposed Trinity: (a) The training procedure. (b) By projecting user's long-term behavior sequence into histograms and customized strategy, multi-interest can be captured. (c) By comparing user's behavior with global cluster popularity, interests on long-tail themes can be clarified. (d) Time-agnostic embeddings produced in the training phase are applied for an i2i search retrieval.
  • Figure 3: Watch history of a sampled user which reveals her short-term interests like games/talk shows and long-term interest (connected by dashed lines) on economics.
  • Figure 4: In Multi-U methods, each user representation (yellow dashed circle) searches candidates individually, so some items may be retrieved (solid hollow circles) more than once (deeper gray refers to more retrievals). With the head count increasing, redundant computational overheads grow heavily. However, in Trinity items are assigned exclusively, each item can be retrieved by at most once.
  • Figure 5: The visualization of items belonging to Trinity clusters. Items of the same row share the same secondary cluster ID.
  • ...and 1 more figures