New User Event Prediction Through the Lens of Causal Inference

Henry Shaowu Yuchi; Shixiang Zhu; Li Dong; Yigit M. Arisoy; Matthew C. Spencer

New User Event Prediction Through the Lens of Causal Inference

Henry Shaowu Yuchi, Shixiang Zhu, Li Dong, Yigit M. Arisoy, Matthew C. Spencer

TL;DR

The paper addresses cold-start next-event prediction for new users with limited history by reframing history as an intervention and user category as a confounder, enabling unbiased counterfactual estimation via inverse propensity weighting. It introduces a category-agnostic intensity framework and an alternating learning algorithm that updates propensity weights and model parameters, supported by a nonparametric history-transition estimator and a theoretical analysis of bias-variance through bin discretization. Empirical results on synthetic data and real datasets (Netflix ratings and Amazon seller contact) show consistent improvements over standard neural and Hawkes-based point processes, especially in heterogeneous and sparse regimes. The approach offers a principled, scalable path to robust, category-agnostic event prediction in large, dynamic user bases and cold-start situations.

Abstract

Modeling and analysis for event series generated by users of heterogeneous behavioral patterns are closely involved in our daily lives, including credit card fraud detection, online platform user recommendation, and social network analysis. The most commonly adopted approach to this task is to assign users to behavior-based categories and analyze each of them separately. However, this requires extensive data to fully understand the user behavior, presenting challenges in modeling newcomers without significant historical knowledge. In this work, we propose a novel discrete event prediction framework for new users with limited history, without needing to know the user's category. We treat the user event history as the "treatment" for future events and the user category as the key confounder. Thus, the prediction problem can be framed as counterfactual outcome estimation, where each event is re-weighted by its inverse propensity score. We demonstrate the improved performance of the proposed framework with a numerical simulation study and two real-world applications, including Netflix rating prediction and seller contact prediction for customer support at Amazon.

New User Event Prediction Through the Lens of Causal Inference

TL;DR

Abstract

Paper Structure (40 sections, 6 theorems, 61 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 40 sections, 6 theorems, 61 equations, 6 figures, 4 tables, 1 algorithm.

Introduction
Related Works
Preliminaries
Marked Temporal Point Processes
Neural Point Processes
New User Event Prediction
Category-agnostic Intensity Estimation
Inverse Propensity Re-weighting
Conditional History Transition
Sensitivity Analysis
Learning and Inference
Experiments
Synthetic Studies
Experimental Setup
Results
...and 25 more sections

Key Result

Lemma 1

The history embedding $h_0, h_1,\dots, h_i$, and $h(x)$ possess the Markov property, formulated as follows: This can be shown from the definition of $h(x)$ in eq:history-var, where $h(x)$ can be specified by $x$ and $h_n$.

Figures (6)

Figure 1: The goal is to predict the next event for a new user without knowing its category. The distribution of the next event is influenced by its category even with the same event history. Each event $x$ consists of its occurrence time $t$ and associated mark data $m$. The conditional intensity $\lambda(x|\mathcal{H}_t, C)$ represents the occurrence rate of the next event $x$ given its history $\mathcal{H}_t$ and the user's category $C$.
Figure 2: Causal DAG between user category, history variables, and category-agnostic intensity. The lowercase notations represent the realizations of these variables. We use $n$ to denote the number of observed events in the history and $h(x)$ (or simply $h$) denotes the history when we observe the next ($n+1$)-th event at $x$.
Figure 3: The histogram of $p(h_i | C)$ and the conditional transition probability $p(h_i | h_{i-1}, C)$ for the three categories $C=1, 2, 3$ when $1/\delta=20$ visually reveal distinct patterns. These differences in the conditional transition probabilities across user categories highlight their unique behavior pattern.
Figure 4: Category-wise prediction MAE of synthetic experiments.
Figure 5: Convergence analysis.
...and 1 more figures

Theorems & Definitions (13)

Definition 1: Random history variable
Lemma 1: Markov property
Definition 2: Dependence of users' future event
Definition 3: Category-agnostic intensity
Lemma 2: Category-agnostic probability density
Proposition 1: Weighted maximum log-likelihood estimation
Proposition 2: Improvement in binning
Proposition 3: Optimal bin size
Remark 1
Remark 2
...and 3 more

New User Event Prediction Through the Lens of Causal Inference

TL;DR

Abstract

New User Event Prediction Through the Lens of Causal Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)