LUMOS: Large User MOdels for User Behavior Prediction
Dhruv Nigam
TL;DR
<3-5 sentence high-level summary> LUMOS addresses the challenge of predicting diverse future user behaviors at scale without task-specific models or hand-crafted features. It proposes a unified transformer architecture (encoder–decoder) that learns from raw daily activity, using a novel cross-attention mechanism conditioned on known future events and a multi-modal tokenization scheme to fuse transactions, event context, and static demographics. The model jointly learns multiple behavior dimensions with uncertainty-weighted multi-task losses and is trained and deployed at production scale, achieving offline ROC-AUC improvements and MAPE reductions, plus measurable online engagement gains. This work demonstrates that future-context conditioning and rich multi-modal representations enable scalable, end-to-end user behavior modeling with practical business impact on event-driven platforms.
Abstract
User behavior prediction at scale remains a critical challenge for online B2C platforms. Traditional approaches rely heavily on task-specific models and domain-specific feature engineering. This is time-consuming, computationally expensive, and requires domain expertise and therefore not scalable. We present LUMOS (Large User MOdel Series), a transformer-based architecture that eliminates task-specific models and manual feature engineering by learning multiple tasks jointly using only raw user activity data. LUMOS introduces a novel cross-attention mechanism that conditions predictions on future known events (e.g., holidays, sales, etc.), enabling the model to predict complex behaviour patterns like "how will upcoming holidays affect user engagement?" The architecture also employs multi-modal tokenization, combining user transactions, event context, and static user demographic attributes into rich representations processed through specialized embedding pathways. Through extensive experiments on a production dataset spanning 275 billion user activity tokens from 250 million users, we demonstrate that LUMOS achieves superior performance compared to traditional task-specific models. Across 5 tasks with established baselines, we achieve an average improvement of 0.025 in ROC-AUC for binary classification tasks and 4.6\% reduction in MAPE for regression tasks. Online A/B testing validates these improvements translate to measurable business impact with a 3.15\% increase in Daily Active Users.
