BehaveGPT: A Foundation Model for Large-scale User Behavior Modeling
Jiahui Gong, Jingtao Ding, Fanjin Meng, Chen Yang, Hong Chen, Zuojian Wang, Haisheng Lu, Yong Li
TL;DR
BehaveGPT introduces a transformer-based foundation model tailored to large-scale user behavior modeling and presents a DRO-based pretraining paradigm to address head-tail imbalances in highly skewed behavior data. The method integrates four embedding streams, Flash Attention, and an MLP head, enabling effective next behavior prediction, new behavior adaptation, long-term generation, and cross-domain transfer. Empirical results on Honor, Mobile, and Tencent datasets show substantial gains in macro and weighted recall, robust new-behavior adaptation, diverse long-term generation, and over 10% cross-domain improvements, supported by analyses of scaling laws and tail behavior handling. The work advances behavioral intelligence by demonstrating that domain-specific foundation models with distributionally robust pretraining can outperform LLM-backed baselines and traditional recommender systems, with practical implications for scalable, cross-domain user behavior prediction.
Abstract
In recent years, foundational models have revolutionized the fields of language and vision, demonstrating remarkable abilities in understanding and generating complex data; however, similar advances in user behavior modeling have been limited, largely due to the complexity of behavioral data and the challenges involved in capturing intricate temporal and contextual relationships in user activities. To address this, we propose BehaveGPT, a foundational model designed specifically for large-scale user behavior prediction. Leveraging transformer-based architecture and a novel pretraining paradigm, BehaveGPT is trained on vast user behavior datasets, allowing it to learn complex behavior patterns and support a range of downstream tasks, including next behavior prediction, long-term generation, and cross-domain adaptation. Our approach introduces the DRO-based pretraining paradigm tailored for user behavior data, which improves model generalization and transferability by equitably modeling both head and tail behaviors. Extensive experiments on real-world datasets demonstrate that BehaveGPT outperforms state-of-the-art baselines, achieving more than a 10% improvement in macro and weighted recall, showcasing its ability to effectively capture and predict user behavior. Furthermore, we measure the scaling law in the user behavior domain for the first time on the Honor dataset, providing insights into how model performance scales with increased data and parameter sizes.
