Table of Contents
Fetching ...

Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

Zheng Chai, Hui Lu, Di Chen, Qin Ren, Yuchao Zheng, Xun Zhou

TL;DR

Adaptive Domain Scaling (ADS) tackles multi-domain discrepancies in sequential recommender systems by personalizing both user sequence representations and target candidate representations. It introduces two modules, PSRG and PCRG, implemented via a share-and-private meta-network that conditions embeddings on domain signals and generates multiple personalized queries for candidates. The approach integrates with standard target-attention backbones and is validated on a public dataset and two billion-scale industrial datasets, showing consistent offline gains and significant online revenue lifts in Douyin Ads and Douyin Ecom. ADS has been deployed across ByteDance services, demonstrating practical viability of domain-aware sequential personalization at industry scale.

Abstract

Users generally exhibit complex behavioral patterns and diverse intentions in multiple business scenarios of super applications like Douyin, presenting great challenges to current industrial multi-domain recommenders. To mitigate the discrepancies across diverse domains, researches and industrial practices generally emphasize sophisticated network structures to accomodate diverse data distributions, while neglecting the inherent understanding of user behavioral sequence from the multi-domain perspective. In this paper, we present Adaptive Domain Scaling (ADS) model, which comprehensively enhances the personalization capability in target-aware sequence modeling across multiple domains. Specifically, ADS comprises of two major modules, including personalized sequence representation generation (PSRG) and personalized candidate representation generation (PCRG). The modules contribute to the tailored multi-domain learning by dynamically learning both the user behavioral sequence item representation and the candidate target item representation under different domains, facilitating adaptive user intention understanding. Experiments are performed on both a public dataset and two billion-scaled industrial datasets, and the extensive results verify the high effectiveness and compatibility of ADS. Besides, we conduct online experiments on two influential business scenarios including Douyin Advertisement Platform and Douyin E-commerce Service Platform, both of which show substantial business improvements. Currently, ADS has been fully deployed in many recommendation services at ByteDance, serving billions of users.

Adaptive Domain Scaling for Personalized Sequential Modeling in Recommenders

TL;DR

Adaptive Domain Scaling (ADS) tackles multi-domain discrepancies in sequential recommender systems by personalizing both user sequence representations and target candidate representations. It introduces two modules, PSRG and PCRG, implemented via a share-and-private meta-network that conditions embeddings on domain signals and generates multiple personalized queries for candidates. The approach integrates with standard target-attention backbones and is validated on a public dataset and two billion-scale industrial datasets, showing consistent offline gains and significant online revenue lifts in Douyin Ads and Douyin Ecom. ADS has been deployed across ByteDance services, demonstrating practical viability of domain-aware sequential personalization at industry scale.

Abstract

Users generally exhibit complex behavioral patterns and diverse intentions in multiple business scenarios of super applications like Douyin, presenting great challenges to current industrial multi-domain recommenders. To mitigate the discrepancies across diverse domains, researches and industrial practices generally emphasize sophisticated network structures to accomodate diverse data distributions, while neglecting the inherent understanding of user behavioral sequence from the multi-domain perspective. In this paper, we present Adaptive Domain Scaling (ADS) model, which comprehensively enhances the personalization capability in target-aware sequence modeling across multiple domains. Specifically, ADS comprises of two major modules, including personalized sequence representation generation (PSRG) and personalized candidate representation generation (PCRG). The modules contribute to the tailored multi-domain learning by dynamically learning both the user behavioral sequence item representation and the candidate target item representation under different domains, facilitating adaptive user intention understanding. Experiments are performed on both a public dataset and two billion-scaled industrial datasets, and the extensive results verify the high effectiveness and compatibility of ADS. Besides, we conduct online experiments on two influential business scenarios including Douyin Advertisement Platform and Douyin E-commerce Service Platform, both of which show substantial business improvements. Currently, ADS has been fully deployed in many recommendation services at ByteDance, serving billions of users.

Paper Structure

This paper contains 18 sections, 12 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Typical business scenarios in Douyin.
  • Figure 2: Overview of ADS. ADS consists of PCRG, PSRG, and target attention module. Given scenario-related features and target item as input, PCRG first generates multiple queries considering the co-pattern of the target item (query) and the scenario features. For PSRG, it takes scenario features as input to generate weight and bias parameters to formulate the personalized MLP, then the original sequence item embedding is passed through the generated MLP to obtain personalized representation. Both the PCRG and PSRG share a share-and-private learning paradigm. Finally, the generated sequence is aggregated by the generated multi-queries with the target-aware attention mechanism, and the concatenation layer and high-level MLP layers are finally used to make predictions.
  • Figure 3: Comparison of traditional target-attention methods (Left) and the Multi-Query Gen-Net (Middle and Right).
  • Figure 4: Model parameter and training FLOPs patterns by varing the number of chunks in ADS, where "Chunk K" means there are K items grouped into a chunk in PCRG.
  • Figure 5: Performance patterns by varying the number of chunks in ADS.