Inference of Utilities and Time Preference in Sequential Decision-Making
Haoyang Cao, Zhengqi Wu, Renyuan Xu
TL;DR
The paper addresses inferring individual investment preferences from sequential decisions by formulating a continuous-time stochastic control problem with dual utilities $U_1$ and $U_2$ and a general time-varying discount $\beta$. Time-inconsistency is handled via state augmentation, with rigorous results on dynamic programming, viscosity solutions, and identifiability of both utilities and discounting under finite and infinite horizons. To operationalize the approach, the authors develop a discrete-time, entropy-regularized MDP and maximum likelihood estimator, showing that the true preference parameters are stationary points and that the likelihood is locally concave, enabling fast gradient-based convergence. Two numerical experiments—Merton's problem and a model with unhedgeable risk—illustrate parameter recovery and how discounting shapes consumption and investment strategies. The framework advances personalized robo-advising and offers generalizable tools for preference learning in domains like healthcare, economics, and AI.
Abstract
This paper introduces a novel stochastic control framework to enhance the capabilities of automated investment managers, or robo-advisors, by accurately inferring clients' investment preferences from past activities. Our approach leverages a continuous-time model that incorporates utility functions and a generic discounting scheme of a time-varying rate, tailored to each client's risk tolerance, valuation of daily consumption, and significant life goals. We address the resulting time inconsistency issue through state augmentation and the establishment of the dynamic programming principle and the verification theorem. Additionally, we provide sufficient conditions for the identifiability of client investment preferences. To complement our theoretical developments, we propose a learning algorithm based on maximum likelihood estimation within a discrete-time Markov Decision Process framework, augmented with entropy regularization. We prove that the log-likelihood function is locally concave, facilitating the fast convergence of our proposed algorithm. Practical effectiveness and efficiency are showcased through two numerical examples, including Merton's problem and an investment problem with unhedgeable risks. Our proposed framework not only advances financial technology by improving personalized investment advice but also contributes broadly to other fields such as healthcare, economics, and artificial intelligence, where understanding individual preferences is crucial.
