Inference of Utilities and Time Preference in Sequential Decision-Making

Haoyang Cao; Zhengqi Wu; Renyuan Xu

Inference of Utilities and Time Preference in Sequential Decision-Making

Haoyang Cao, Zhengqi Wu, Renyuan Xu

TL;DR

The paper addresses inferring individual investment preferences from sequential decisions by formulating a continuous-time stochastic control problem with dual utilities $U_1$ and $U_2$ and a general time-varying discount $\beta$. Time-inconsistency is handled via state augmentation, with rigorous results on dynamic programming, viscosity solutions, and identifiability of both utilities and discounting under finite and infinite horizons. To operationalize the approach, the authors develop a discrete-time, entropy-regularized MDP and maximum likelihood estimator, showing that the true preference parameters are stationary points and that the likelihood is locally concave, enabling fast gradient-based convergence. Two numerical experiments—Merton's problem and a model with unhedgeable risk—illustrate parameter recovery and how discounting shapes consumption and investment strategies. The framework advances personalized robo-advising and offers generalizable tools for preference learning in domains like healthcare, economics, and AI.

Abstract

This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.

Inference of Utilities and Time Preference in Sequential Decision-Making

TL;DR

The paper addresses inferring individual investment preferences from sequential decisions by formulating a continuous-time stochastic control problem with dual utilities

and

and a general time-varying discount

. Time-inconsistency is handled via state augmentation, with rigorous results on dynamic programming, viscosity solutions, and identifiability of both utilities and discounting under finite and infinite horizons. To operationalize the approach, the authors develop a discrete-time, entropy-regularized MDP and maximum likelihood estimator, showing that the true preference parameters are stationary points and that the likelihood is locally concave, enabling fast gradient-based convergence. Two numerical experiments—Merton's problem and a model with unhedgeable risk—illustrate parameter recovery and how discounting shapes consumption and investment strategies. The framework advances personalized robo-advising and offers generalizable tools for preference learning in domains like healthcare, economics, and AI.

Abstract

Paper Structure (16 sections, 12 theorems, 107 equations, 6 figures, 1 algorithm)

This paper contains 16 sections, 12 theorems, 107 equations, 6 figures, 1 algorithm.

Introduction
Our framework, results, and contributions.
Related literature and comparisons to our results.
Continuous-time Framework
Finite-time Horizon
Market dynamics and client's wealth.
Client's preference.
General discounting scheme.
Preliminary Analysis
The Inverse Problem: Identifiability of the Utility Functions
Infinite-time Horizon
Discrete-time MDP with Entropy Regularization
Maximum Likelihood Estimation
Algorithm Design and Implementation
Numerical example one: Merton's problem
...and 1 more sections

Key Result

Lemma 1

Assume that $U_1,U_2\in\mathcal{U}$. Moreover, assume that $U_1(0)=0$ and $U_2(0)=-\infty$. For any $(t,x,z)\in[0,T]\times(0,\infty)\times[0,1]$, if the policy $\pmb{\alpha}^*,\pmb{c}^*$ satisfies that $J(t,x,z,\pmb{\alpha}^*,\pmb{c}^*)=V(t,x,z)$, then it holds almost surely that where $X^{\pmb{\alpha},\pmb{c}}$ solves eq:gen-wealth on $[t,T]$ given $(\pmb{\alpha},\pmb{c})=(\pmb{\alpha}^*,\pmb{c}

Figures (6)

Figure 1: Visualization of the log-likelihood function and its gradients ( Left columns: visualization with respect to $\theta$ (under $\rho = \Bar{\rho}$). Right columns: visualization with respect to $\rho$ (under $\theta = \Bar{\theta}$)).
Figure 2: The convergence result of Algorithm \ref{['alg:ML']}. The left plot shows the value of $\theta$ at each iteration, while the right plot displays the values for $\rho$.
Figure 3: Visualization of the client's consumption policy. The left plot illustrates consumption at various wealth levels under $\Bar{\rho} = 0.1$, while the right plot corresponds to $\Bar{\rho} = 0.75$.
Figure 4: Visualization of the log-likelihood function and its gradients ( Left columns: visualization with respect to $\theta_1$ (under $\theta_2 = \Bar{\theta}_2$ and $\rho = \Bar{\rho}$). Middle columns: visualization with respect to $\theta_2$ (under $\theta_1 = \Bar{\theta}_1$ and $\rho = \Bar{\rho}$). Right columns: visualization with respect to $\rho$ (under $\theta_1 = \Bar{\theta}_1$ and $\theta_2 = \Bar{\theta}_2$).)
Figure 5: The convergence result of Algorithm \ref{['alg:ML']}. The left plot shows the value of $\theta_1$ at each iteration, the middle plot is for $\theta_2$, and the right plot is for $\rho$.
...and 1 more figures

Theorems & Definitions (24)

Lemma 1
proof
Lemma 2
proof
Proposition 1: Dynamic programming principle (DPP)
proof
Proposition 2
proof
Definition 1: Viscosity solution
Proposition 3
...and 14 more

Inference of Utilities and Time Preference in Sequential Decision-Making

TL;DR

Abstract

Inference of Utilities and Time Preference in Sequential Decision-Making

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (24)