System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes
Arpit Agarwal, Nicolas Usunier, Alessandro Lazaric, Maximilian Nickel
TL;DR
This paper addresses the misalignment between engagement metrics and true user utility in recommender systems by modeling user return behavior with a two-component Hawkes process that captures System-1 (impulsive) and System-2 (utility-driven) dynamics. It proposes a generative framework with item embeddings ${\mathbf{v}}_j$ and user embeddings ${\mathbf{u}}^1_i, {\mathbf{u}}^2_i$, where the arrival intensity $\lambda_i(t)$ combines short-lived System-1 effects and longer-lasting System-2 effects via $\alpha^1_{it}=\phi({\mathbf{v}_{S_{it}}^T}{\mathbf{u}}^1_i)$ and $\alpha^2_{it}=\phi({\mathbf{v}_{S_{it}}^T}{\mathbf{u}}^2_i)$ with exponential decays $e^{-\beta^1_i(t-t')}$ and $e^{-\beta^2_i(t-t')}$. The authors prove identifiability of the two components under mild conditions and establish consistency of maximum likelihood estimation, enabling separation of utility and allure signals from historical interactions. Synthetic experiments demonstrate accurate recovery of parameters and show that content optimization based on the estimated utility ${\mathbf{u}}^2_i$ yields higher long-term utility than optimization grounded in engagement signals. The work suggests a practical path toward utility-aligned recommendations and highlights directions for extending to non-stationary settings and richer session representations. Overall, the approach provides theoretical guarantees for disentangling dual-system influences and offers a principled shift from engagement-focused to utility-focused content optimization with potential societal and platform-level benefits.
Abstract
Recommender systems are an important part of the modern human experience whose influence ranges from the food we eat to the news we read. Yet, there is still debate as to what extent recommendation platforms are aligned with the user goals. A core issue fueling this debate is the challenge of inferring a user utility based on engagement signals such as likes, shares, watch time etc., which are the primary metric used by platforms to optimize content. This is because users utility-driven decision-processes (which we refer to as System-2), e.g., reading news that are relevant for them, are often confounded by their impulsive decision-processes (which we refer to as System-1), e.g., spend time on click-bait news. As a result, it is difficult to infer whether an observed engagement is utility-driven or impulse-driven. In this paper we explore a new approach to recommender systems where we infer user utility based on their return probability to the platform rather than engagement signals. Our intuition is that users tend to return to a platform in the long run if it creates utility for them, while pure engagement-driven interactions that do not add utility, may affect user return in the short term but will not have a lasting effect. We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process. These arrival rates to the platform are a combination of both System-1 and System-2 decision processes. The System-2 arrival intensity depends on the utility and has a long lasting effect, while the System-1 intensity depends on the instantaneous gratification and tends to vanish rapidly. We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility. We conduct experiments on synthetic data to demonstrate the effectiveness of our approach.
