Table of Contents
Fetching ...

Aggregate and Broadcast: Scalable and Efficient Feature Interaction for Recommender Systems

Kaiyuan Li, Yongxiang Tang, Wenzheng Shu, Yanxiang Zeng, Chao Wang, Yanhua Cheng, Xialong Liu, Peng Jiang

TL;DR

The paper tackles the challenge of expressive yet scalable feature interaction for large-scale recommender systems. It introduces INFNet, a hub-token mediated aggregate-and-broadcast architecture that preserves width and achieves linear complexity, enabling task-aware interaction across categorical, sequential, and task features. Through a two-phase interaction (global aggregation via cross-attention to compact hubs, followed by affine broadcast back to local tokens) and multi-task optimization, INFNet demonstrates superior performance and scaling on both public benchmarks and a large industrial dataset, with online deployment yielding revenue and CTR gains. The work provides deep empirical evidence of improved information flow, including ablations and visualizations, and offers practical guidance on hub budgets, initialization, and efficiency trade-offs for industrial settings.

Abstract

Feature interaction is a core ingredient in ranking models for large-scale recommender systems, yet making it both expressive and efficiently scalable remains challenging. Exhaustive pairwise interaction is powerful but incurs quadratic complexity in the number of tokens/features, while many efficient alternatives rely on restrictive structures that limit information exchange. We further identify two common bottlenecks in practice: (1) early aggregation of behavior sequences compresses fine-grained signals, making it difficult for deeper layers to reuse item-level details; and (2) late fusion injects task signals only at the end, preventing task objectives from directly guiding the interaction process. To address these issues, we propose the Information Flow Network (INFNet), a lightweight architecture that enables scalable, task-aware feature interaction with linear complexity. INFNet represents categorical features, behavior sequences, and task identifiers as tokens, and introduces a small set of hub tokens for each group to serve as communication hubs. Interaction is realized through an efficient aggregate-and-broadcast information flow: hub tokens aggregate global context across groups via cross-attention, and a lightweight gated broadcast unit injects the refined context back to update the categorical, sequence, and task tokens. This design supports width-preserving stacking that preserves item-level signals in sequence and enables task-guided interaction throughout the network, while reducing interaction cost from quadratic to linear in the number of feature tokens. Experiments on a public benchmark and a large-scale industrial dataset demonstrate that INFNet consistently outperforms strong baselines and exhibits strong scaling behavior. In a commercial online advertising system, deploying INFNet improves revenue by +1.587% and click-through rate by +1.155%.

Aggregate and Broadcast: Scalable and Efficient Feature Interaction for Recommender Systems

TL;DR

The paper tackles the challenge of expressive yet scalable feature interaction for large-scale recommender systems. It introduces INFNet, a hub-token mediated aggregate-and-broadcast architecture that preserves width and achieves linear complexity, enabling task-aware interaction across categorical, sequential, and task features. Through a two-phase interaction (global aggregation via cross-attention to compact hubs, followed by affine broadcast back to local tokens) and multi-task optimization, INFNet demonstrates superior performance and scaling on both public benchmarks and a large industrial dataset, with online deployment yielding revenue and CTR gains. The work provides deep empirical evidence of improved information flow, including ablations and visualizations, and offers practical guidance on hub budgets, initialization, and efficiency trade-offs for industrial settings.

Abstract

Feature interaction is a core ingredient in ranking models for large-scale recommender systems, yet making it both expressive and efficiently scalable remains challenging. Exhaustive pairwise interaction is powerful but incurs quadratic complexity in the number of tokens/features, while many efficient alternatives rely on restrictive structures that limit information exchange. We further identify two common bottlenecks in practice: (1) early aggregation of behavior sequences compresses fine-grained signals, making it difficult for deeper layers to reuse item-level details; and (2) late fusion injects task signals only at the end, preventing task objectives from directly guiding the interaction process. To address these issues, we propose the Information Flow Network (INFNet), a lightweight architecture that enables scalable, task-aware feature interaction with linear complexity. INFNet represents categorical features, behavior sequences, and task identifiers as tokens, and introduces a small set of hub tokens for each group to serve as communication hubs. Interaction is realized through an efficient aggregate-and-broadcast information flow: hub tokens aggregate global context across groups via cross-attention, and a lightweight gated broadcast unit injects the refined context back to update the categorical, sequence, and task tokens. This design supports width-preserving stacking that preserves item-level signals in sequence and enables task-guided interaction throughout the network, while reducing interaction cost from quadratic to linear in the number of feature tokens. Experiments on a public benchmark and a large-scale industrial dataset demonstrate that INFNet consistently outperforms strong baselines and exhibits strong scaling behavior. In a commercial online advertising system, deploying INFNet improves revenue by +1.587% and click-through rate by +1.155%.

Paper Structure

This paper contains 23 sections, 12 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Comparison of Feature Interaction Paradigms when stacking L layers. (a) All-to-All Interaction (e.g., HSTU) enables exhaustive connectivity but suffers from quadratic complexity. (b) Efficient Interaction (e.g., RankMixer) relies on Early Aggregation to compress sequences and Late Task Fusion, which acts as a bottleneck for information flow. (c) Aggregate-and-Broadcast Interaction (Ours): INFNet utilizes an Aggregate-and-Broadcast mechanism mediated by hub tokens. It maintains a width-preserving architecture with linear complexity, ensuring effective signal propagation and task-guided interaction.
  • Figure 2: The overall architecture of INFNet. (Top Panel) The model workflow. Left: Group-wise hubs are initialized via distinct strategies (MLP for Categorical, Pooling for Sequence, and Hybrid for Task). Right: The stacked INFNet Blocks process all feature groups in the aggregation-and-broadcast mechanism. (Bottom Panel) Detailed illustration of the two-phase interaction mechanism, using the Categorical Group as an example: 1) Multi-view Global Aggregation: Hubs (e.g., $\tilde{C}^{(l)}$) act as queries to harvest global context from the union of all original tokens ($\{C, S, T\}$) via Cross-Attention. 2) Global-to-Local Affine Broadcast: The refined hubs ($\tilde{C}^{(l+1)}$) generate affine parameters ($\alpha, \beta$) to modulate their corresponding original tokens ($C^{(l)}$), effectively broadcasting global context back to local features.
  • Figure 3: Component effectiveness analysis. We report the average AUC across all tasks for INFNet and its variants, each representing the removal or degradation of a key interaction phase.
  • Figure 4: Sensitivity analysis of hub composition. We evaluate the impact of (a) the number of categorical hubs $N_c$ and (b) shared task hubs $N_s$ on the Industrial dataset.
  • Figure 5: Heatmap of cross-attention weights between task hubs (Y-axis) and input tokens (X-axis) in KuaiRand dataset. For visualization purposes, extensive sequence tokens are aggregated by their behavior types.
  • ...and 1 more figures