Table of Contents
Fetching ...

Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning

Ziteng Cheng, Anthony Coache, Sebastian Jaimungal

TL;DR

The paper develops an inverse reinforcement learning framework to elicit non-expert clients’ risk aversion via adaptive binary questions in a robo-advising setting, modeling risk with spectral risk measures. It proves finite-sample identifiability and a $\sqrt{N}$-type convergence rate (up to a logarithmic factor) for well-designed questions, introduces distinguishing power and indifference curves to quantify question effectiveness, and provides constructive lower bounds on discrimination capabilities. Through simulations, design-based questioning markedly outperforms random questioning, yielding satisfactory estimates with fewer than 50 questions, and extends to a richer, near-continuous risk space using a particle-based approach. The work also offers a preliminary infinite-horizon analysis with discounting and discusses promising future directions, including multiple-choice questions, Bayesian formulations, and function-approximation methods for dynamic or continuous settings. Overall, the framework advances efficient, data-driven elicitation of risk preferences for robo-advisors and contributes novel theory on identifiability and learning rates in non-parametric IRL with spectral risk measures.

Abstract

We investigate a framework for robo-advisors to estimate non-expert clients' risk aversion using adaptive binary-choice questionnaires. We model risk aversion using cost functions and spectral risk measures in a static setting. We prove the finite-sample identifiability and, for properly designed questions, obtain a convergence rate of $\sqrt{N}$ up to a logarithmic factor, where $N$ is the number of questions. We introduce the notion of distinguishing power and demonstrate, through simulated experiments, that designing questions by maximizing distinguishing power achieves satisfactory accuracy in learning risk aversion with fewer than 50 questions. We also provide a preliminary investigation of an infinite-horizon setting with an additional discount factor for dynamic risk aversion, establishing qualitative identifiability in this case.

Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning

TL;DR

The paper develops an inverse reinforcement learning framework to elicit non-expert clients’ risk aversion via adaptive binary questions in a robo-advising setting, modeling risk with spectral risk measures. It proves finite-sample identifiability and a -type convergence rate (up to a logarithmic factor) for well-designed questions, introduces distinguishing power and indifference curves to quantify question effectiveness, and provides constructive lower bounds on discrimination capabilities. Through simulations, design-based questioning markedly outperforms random questioning, yielding satisfactory estimates with fewer than 50 questions, and extends to a richer, near-continuous risk space using a particle-based approach. The work also offers a preliminary infinite-horizon analysis with discounting and discusses promising future directions, including multiple-choice questions, Bayesian formulations, and function-approximation methods for dynamic or continuous settings. Overall, the framework advances efficient, data-driven elicitation of risk preferences for robo-advisors and contributes novel theory on identifiability and learning rates in non-parametric IRL with spectral risk measures.

Abstract

We investigate a framework for robo-advisors to estimate non-expert clients' risk aversion using adaptive binary-choice questionnaires. We model risk aversion using cost functions and spectral risk measures in a static setting. We prove the finite-sample identifiability and, for properly designed questions, obtain a convergence rate of up to a logarithmic factor, where is the number of questions. We introduce the notion of distinguishing power and demonstrate, through simulated experiments, that designing questions by maximizing distinguishing power achieves satisfactory accuracy in learning risk aversion with fewer than 50 questions. We also provide a preliminary investigation of an infinite-horizon setting with an additional discount factor for dynamic risk aversion, establishing qualitative identifiability in this case.
Paper Structure (41 sections, 16 theorems, 200 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 41 sections, 16 theorems, 200 equations, 9 figures, 1 table, 1 algorithm.

Key Result

Lemma 2.2

For $\mu\in\mathcal{P}([0,1])$ satisfying $\mu(1)=0$, we let $\sigma_\mu$ be defined in eq:Defsigma. Then, $\sigma_\mu$ is nonnegative, nondecreasing, right continuous, and $\int_0^1\sigma_\mu(\alpha)\mathop{\mathrm{d \!}}\nolimits \alpha =1$. Moreover, $\sigma_\mu$ characterizes $\mu$ and $\rho_\mu

Figures (9)

  • Figure 1: Illustration of a separating environment.
  • Figure 2: Convergence of $\mathbb{Q}_N$ in the one-period setting.
  • Figure 3: Evolution of the designed questions and $\mathbb{Q}_N$ in the one-period setting for $k=4$.
  • Figure 4: Evolution of the designed questions and $\mathbb{Q}_N$ in the one-period setting for $k=10$.
  • Figure 5: Evolution of $\mathbb{Q}_N$ in the one-period setting with misspecification.
  • ...and 4 more figures

Theorems & Definitions (46)

  • Remark 2.1
  • Lemma 2.2
  • Remark 2.3
  • Remark 2.4
  • Proposition 3.1
  • Lemma 3.2
  • Theorem 4.2
  • Remark 4.3
  • Remark 4.4
  • Theorem 4.6
  • ...and 36 more