Aggregate and Broadcast: Scalable and Efficient Feature Interaction for Recommender Systems
Kaiyuan Li, Yongxiang Tang, Wenzheng Shu, Yanxiang Zeng, Chao Wang, Yanhua Cheng, Xialong Liu, Peng Jiang
TL;DR
The paper tackles the challenge of expressive yet scalable feature interaction for large-scale recommender systems. It introduces INFNet, a hub-token mediated aggregate-and-broadcast architecture that preserves width and achieves linear complexity, enabling task-aware interaction across categorical, sequential, and task features. Through a two-phase interaction (global aggregation via cross-attention to compact hubs, followed by affine broadcast back to local tokens) and multi-task optimization, INFNet demonstrates superior performance and scaling on both public benchmarks and a large industrial dataset, with online deployment yielding revenue and CTR gains. The work provides deep empirical evidence of improved information flow, including ablations and visualizations, and offers practical guidance on hub budgets, initialization, and efficiency trade-offs for industrial settings.
Abstract
Feature interaction is a core ingredient in ranking models for large-scale recommender systems, yet making it both expressive and efficiently scalable remains challenging. Exhaustive pairwise interaction is powerful but incurs quadratic complexity in the number of tokens/features, while many efficient alternatives rely on restrictive structures that limit information exchange. We further identify two common bottlenecks in practice: (1) early aggregation of behavior sequences compresses fine-grained signals, making it difficult for deeper layers to reuse item-level details; and (2) late fusion injects task signals only at the end, preventing task objectives from directly guiding the interaction process. To address these issues, we propose the Information Flow Network (INFNet), a lightweight architecture that enables scalable, task-aware feature interaction with linear complexity. INFNet represents categorical features, behavior sequences, and task identifiers as tokens, and introduces a small set of hub tokens for each group to serve as communication hubs. Interaction is realized through an efficient aggregate-and-broadcast information flow: hub tokens aggregate global context across groups via cross-attention, and a lightweight gated broadcast unit injects the refined context back to update the categorical, sequence, and task tokens. This design supports width-preserving stacking that preserves item-level signals in sequence and enables task-guided interaction throughout the network, while reducing interaction cost from quadratic to linear in the number of feature tokens. Experiments on a public benchmark and a large-scale industrial dataset demonstrate that INFNet consistently outperforms strong baselines and exhibits strong scaling behavior. In a commercial online advertising system, deploying INFNet improves revenue by +1.587% and click-through rate by +1.155%.
