Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning

Federico Cacciamani; Roberto Daluiso; Marco Pinciroli; Michele Trapletti; Edoardo Vittori

Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning

Federico Cacciamani, Roberto Daluiso, Marco Pinciroli, Michele Trapletti, Edoardo Vittori

TL;DR

The paper addresses the limitation of traditional RL, which optimizes expected return, for financial sequential decision problems where the time distribution of profits and losses matters. It introduces a time-inhomogeneous risk objective by penalizing deviations from stepwise conditional means, via $\mathbb{E}_\pi[\mathcal{G}] - \beta\varsigma^2_\pi$, and generalizes with inhomogeneous $\ell$-volatility and optimised certainty equivalents, enabling arbitrary per-step targets. The authors develop policy-gradient–based algorithms (IVE/IVO) with a Bellman-like target $X_{\pi,i}$ and derive the gradient of the volatility term, including nested optimization over step targets; they validate the approach on toy environments, a deterministic-horizon optimal execution task, and a stochastic-horizon grid world. The results show that the time-inhomogeneous objective can produce more financially sensible risk-aware policies than homogeneous risk criteria, especially in execution problems, while also providing a flexible framework for further extensions in finance. Overall, this framework enables time-aware risk control in RL for finance, with potential applicability to hedging and budgeting where the timing of rewards is crucial.

Abstract

In finance, sequential decision problems are often faced, for which reinforcement learning (RL) emerges as a promising tool for optimisation without the need of analytical tractability. However, the objective of classical RL is the expected cumulated reward, while financial applications typically require a trade-off between return and risk. In this work, we focus on settings where one cares about the time split of the total return, ruling out most risk-aware generalisations of RL which optimise a risk measure defined on the latter. We notice that a preference for homogeneous splits, which we found satisfactory for hedging, can be unfit for other problems, and therefore propose a new risk metric which still penalises uncertainty of the single rewards, but allows for an arbitrary planning of their target levels. We study the properties of the resulting objective and the generalisation of learning algorithms to optimise it. Finally, we show numerical results on toy examples.

Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning

TL;DR

, and generalizes with inhomogeneous

-volatility and optimised certainty equivalents, enabling arbitrary per-step targets. The authors develop policy-gradient–based algorithms (IVE/IVO) with a Bellman-like target

and derive the gradient of the volatility term, including nested optimization over step targets; they validate the approach on toy environments, a deterministic-horizon optimal execution task, and a stochastic-horizon grid world. The results show that the time-inhomogeneous objective can produce more financially sensible risk-aware policies than homogeneous risk criteria, especially in execution problems, while also providing a flexible framework for further extensions in finance. Overall, this framework enables time-aware risk control in RL for finance, with potential applicability to hedging and budgeting where the timing of rewards is crucial.

Abstract

Paper Structure (14 sections, 4 theorems, 38 equations, 6 figures)

This paper contains 14 sections, 4 theorems, 38 equations, 6 figures.

Introduction
Related literature
Theory
Definitions and properties
Algorithms
Examples
Toy example
Deterministic-horizon example: optimal execution
Stochastic-horizon example: grid world
Inhomogeneous mean-volatility
Inhomogeneous monotone mean-volatility
Inhomogeneous optimised certainty equivalent
Dynamic monotone hull and IVO
Conclusion

Key Result

Theorem 2.5

For any policy $\pi$, it holds that

Figures (6)

Figure 1: Optimal execution paths for different risk aversion coefficients in the (homogeneous) mean-volatility objective, obtained by the TRVO algorithm.
Figure 2: Optimal execution paths for different risk aversion coefficients in the inhomogeneous mean-volatility objective, obtained by the IVO algorithm.
Figure 3: Grid-world instance used in \ref{['sec:gridworld']}, taken from moldovan2012risk. Each episode starts from the cell marked by x and ends when the green cell is reached with reward +35, or a red cell is reached with reward -35. On every other timestep the reward is -1, for a maximum of 35 time steps.
Figure 4: Path generated in a noiseless test environment by optimal policies for inhomogeneous mean-volatility \ref{['eq:inhomogeneous_meanvar']}, for a selection of values of the risk aversion $\beta$.
Figure 5: Path generated in a noiseless test environment by optimal policies for inhomogeneous monotone mean-volatility (\ref{['ex:monotone_meanvar']}), for a selection of values of the risk aversion $\beta$.
...and 1 more figures

Theorems & Definitions (17)

Definition 2.1: Inhomogeneous reward volatility
Definition 2.2: Inhomogeneous mean-volatility
Remark 2.3: Comparison to homogeneous definitions
Remark 2.4: Irreducibility to classical RL
Theorem 2.5: Volatility inequality
proof
Theorem 2.6: Variance inequality
proof
Definition 2.7: Inhomogeneous $\ell$-volatility
Example 2.8: Inhomogeneous optimised certainty equivalent
...and 7 more

Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning

TL;DR

Abstract

Time-Inhomogeneous Volatility Aversion for Financial Applications of Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (17)