Table of Contents
Fetching ...

HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders

Kun Yuan, Junyu Bi, Daixuan Cheng, Changfa Wu, Shuwen Xiao, Binbin Cao, Jian Wu, Yuning Jiang

TL;DR

Hierarchical Sparse Activation Compression (HiSAC) is proposed, an efficient framework for personalized sequence modeling that achieves significant compression and cost reduction, with online A/B tests showing a consistent 1.65% CTR uplift -- demonstrating its scalability and real-world effectiveness.

Abstract

Modern recommender systems leverage ultra-long user behavior sequences to capture dynamic preferences, but end-to-end modeling is infeasible in production due to latency and memory constraints. While summarizing history via interest centers offers a practical alternative, existing methods struggle to (1) identify user-specific centers at appropriate granularity and (2) accurately assign behaviors, leading to quantization errors and loss of long-tail preferences. To alleviate these issues, we propose Hierarchical Sparse Activation Compression (HiSAC), an efficient framework for personalized sequence modeling. HiSAC encodes interactions into multi-level semantic IDs and constructs a global hierarchical codebook. A hierarchical voting mechanism sparsely activates personalized interest-agents as fine-grained preference centers. Guided by these agents, Soft-Routing Attention aggregates historical signals in semantic space, weighting by similarity to minimize quantization error and retain long-tail behaviors. Deployed on Taobao's "Guess What You Like" homepage, HiSAC achieves significant compression and cost reduction, with online A/B tests showing a consistent 1.65% CTR uplift -- demonstrating its scalability and real-world effectiveness.

HiSAC: Hierarchical Sparse Activation Compression for Ultra-long Sequence Modeling in Recommenders

TL;DR

Hierarchical Sparse Activation Compression (HiSAC) is proposed, an efficient framework for personalized sequence modeling that achieves significant compression and cost reduction, with online A/B tests showing a consistent 1.65% CTR uplift -- demonstrating its scalability and real-world effectiveness.

Abstract

Modern recommender systems leverage ultra-long user behavior sequences to capture dynamic preferences, but end-to-end modeling is infeasible in production due to latency and memory constraints. While summarizing history via interest centers offers a practical alternative, existing methods struggle to (1) identify user-specific centers at appropriate granularity and (2) accurately assign behaviors, leading to quantization errors and loss of long-tail preferences. To alleviate these issues, we propose Hierarchical Sparse Activation Compression (HiSAC), an efficient framework for personalized sequence modeling. HiSAC encodes interactions into multi-level semantic IDs and constructs a global hierarchical codebook. A hierarchical voting mechanism sparsely activates personalized interest-agents as fine-grained preference centers. Guided by these agents, Soft-Routing Attention aggregates historical signals in semantic space, weighting by similarity to minimize quantization error and retain long-tail behaviors. Deployed on Taobao's "Guess What You Like" homepage, HiSAC achieves significant compression and cost reduction, with online A/B tests showing a consistent 1.65% CTR uplift -- demonstrating its scalability and real-world effectiveness.
Paper Structure (25 sections, 13 equations, 6 figures, 6 tables)

This paper contains 25 sections, 13 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Overview of HiSAC, which compresses ultra-long user sequences in three stages: (a) tokenization of historical behaviors; (b) sparse activation by hierarchical voting for interest-agents selection; (c) soft-routing attention to produce the compressed representation.
  • Figure 2: Comparison of industrial deployment architectures. (a) Standard HiSAC combined with multi-head attention (MHA). (b) Optimized deployment incorporating online and offline cache for reduced latency and cost.
  • Figure 3: Sequence length scaling on model performance.
  • Figure 4: Parameter analysis results for Interest-Agents generation, including the impact of RQ-VAE quantization levels, the number of interest-agents $K$, the hierarchical voting top-$k$ strategy. Each curve reports model performance under varying parameter values with other settings fixed.
  • Figure 5: Impact of the temperature coefficient $\tau$ on AUC and the distribution of attention weight.
  • ...and 1 more figures