System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes

Arpit Agarwal; Nicolas Usunier; Alessandro Lazaric; Maximilian Nickel

System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes

Arpit Agarwal, Nicolas Usunier, Alessandro Lazaric, Maximilian Nickel

TL;DR

This paper addresses the misalignment between engagement metrics and true user utility in recommender systems by modeling user return behavior with a two-component Hawkes process that captures System-1 (impulsive) and System-2 (utility-driven) dynamics. It proposes a generative framework with item embeddings ${\mathbf{v}}_j$ and user embeddings ${\mathbf{u}}^1_i, {\mathbf{u}}^2_i$, where the arrival intensity $\lambda_i(t)$ combines short-lived System-1 effects and longer-lasting System-2 effects via $\alpha^1_{it}=\phi({\mathbf{v}_{S_{it}}^T}{\mathbf{u}}^1_i)$ and $\alpha^2_{it}=\phi({\mathbf{v}_{S_{it}}^T}{\mathbf{u}}^2_i)$ with exponential decays $e^{-\beta^1_i(t-t')}$ and $e^{-\beta^2_i(t-t')}$. The authors prove identifiability of the two components under mild conditions and establish consistency of maximum likelihood estimation, enabling separation of utility and allure signals from historical interactions. Synthetic experiments demonstrate accurate recovery of parameters and show that content optimization based on the estimated utility ${\mathbf{u}}^2_i$ yields higher long-term utility than optimization grounded in engagement signals. The work suggests a practical path toward utility-aligned recommendations and highlights directions for extending to non-stationary settings and richer session representations. Overall, the approach provides theoretical guarantees for disentangling dual-system influences and offers a principled shift from engagement-focused to utility-focused content optimization with potential societal and platform-level benefits.

Abstract

Recommender systems are an important part of the modern human experience whose influence ranges from the food we eat to the news we read. Yet, there is still debate as to what extent recommendation platforms are aligned with the user goals. A core issue fueling this debate is the challenge of inferring a user utility based on engagement signals such as likes, shares, watch time etc., which are the primary metric used by platforms to optimize content. This is because users utility-driven decision-processes (which we refer to as System-2), e.g., reading news that are relevant for them, are often confounded by their impulsive decision-processes (which we refer to as System-1), e.g., spend time on click-bait news. As a result, it is difficult to infer whether an observed engagement is utility-driven or impulse-driven. In this paper we explore a new approach to recommender systems where we infer user utility based on their return probability to the platform rather than engagement signals. Our intuition is that users tend to return to a platform in the long run if it creates utility for them, while pure engagement-driven interactions that do not add utility, may affect user return in the short term but will not have a lasting effect. We propose a generative model in which past content interactions impact the arrival rates of users based on a self-exciting Hawkes process. These arrival rates to the platform are a combination of both System-1 and System-2 decision processes. The System-2 arrival intensity depends on the utility and has a long lasting effect, while the System-1 intensity depends on the instantaneous gratification and tends to vanish rapidly. We show analytically that given samples it is possible to disentangle System-1 and System-2 and allow content optimization based on user utility. We conduct experiments on synthetic data to demonstrate the effectiveness of our approach.

System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes

TL;DR

and user embeddings

, where the arrival intensity

combines short-lived System-1 effects and longer-lasting System-2 effects via

and

with exponential decays

and

. The authors prove identifiability of the two components under mild conditions and establish consistency of maximum likelihood estimation, enabling separation of utility and allure signals from historical interactions. Synthetic experiments demonstrate accurate recovery of parameters and show that content optimization based on the estimated utility

yields higher long-term utility than optimization grounded in engagement signals. The work suggests a practical path toward utility-aligned recommendations and highlights directions for extending to non-stationary settings and richer session representations. Overall, the approach provides theoretical guarantees for disentangling dual-system influences and offers a principled shift from engagement-focused to utility-focused content optimization with potential societal and platform-level benefits.

Abstract

Paper Structure (22 sections, 4 theorems, 16 equations, 2 figures)

This paper contains 22 sections, 4 theorems, 16 equations, 2 figures.

Introduction
Preliminaries and Problem Setup
Dual System and Inconsistent Preferences
Temporal Point Processes and The Hawkes Process
Our Recommender Model
Goal
Identifiability and Consistency
Experiments
Effect of sample size on the estimation error
Effect of gap between $\beta^1$ and $\beta^2$ on estimation error
Comparing utility and engagement maximization in terms of (dis-)similarity b/w $\mathbf{u}^1$ and $\mathbf{u}^2$
Comparing utility and engagement maximization in terms of inventory of $\mathbf{u}^2$ items
Discussion
Alternative Platform Objectives
Alternative Session Summarizing Techniques
...and 7 more sections

Key Result

Lemma 1

A class of Hawkes processes $\{\lambda(t; \boldsymbol{\theta}): \boldsymbol{\theta} \in \Theta\}$ is identifiable if the corresponding trigger intensity $\kappa$ is identifiable, i.e., if $\kappa(t;\boldsymbol{\eta}_1) = \kappa(t; \boldsymbol{\eta}_2)$$\forall t$, then $\boldsymbol{\eta}_1 = \boldsy

Figures (2)

Figure 1: The error in parameter estimation as a function of number of samples on the left and gap in decay rates on the right.
Figure 2: The blue curve and the red curve show the session utility obtained by optimizing items with respect to estimated $\mathbf{u}^1+\mathbf{u}^2$ and estimated $\mathbf{u}^2$, respectively, plotted as a function of $\mathbf{u}^1,\mathbf{u}^2$ dissimilarity on the left and $\mathbf{u}^2$ inventory on the right.

Theorems & Definitions (8)

Definition 1: Identifiability
Lemma 1: Guo+18
Theorem 1
proof
Definition 2: Consistency
Theorem 2
Lemma 2
proof

System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes

TL;DR

Abstract

System-2 Recommenders: Disentangling Utility and Engagement in Recommendation Systems via Temporal Point-Processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (8)