Table of Contents
Fetching ...

Statistical Learning with Sublinear Regret of Propagator Models

Eyal Neuman, Yufei Zhang

TL;DR

This work addresses optimal execution under unknown transient price impact governed by a propagator kernel $G$ and a temporary impact parameter $\lambda$. It advances a nonparametric, kernel-based learning framework that alternates exploration and exploitation, using a regularised least-squares estimator to identify $(\lambda,G)$ from visible prices and a price predictor signal. The authors prove high-probability, sublinear regret for a phased learning algorithm and derive convergence rates that depend on kernel regularity, including optimal rates for regular and power-law singular kernels. A rigorous Lipschitz stability analysis of the infinite-dimensional control problem under model misspecification enables quantitative bounds on the performance gap, with complementary numerical implementations illustrating the kernel estimation behavior for singular propagators.

Abstract

We consider a class of learning problems in which an agent liquidates a risky asset while creating both transient price impact driven by an unknown convolution propagator and linear temporary price impact with an unknown parameter. We characterize the trader's performance as maximization of a revenue-risk functional, where the trader also exploits available information on a price predicting signal. We present a trading algorithm that alternates between exploration and exploitation phases and achieves sublinear regrets with high probability. For the exploration phase we propose a novel approach for non-parametric estimation of the price impact kernel by observing only the visible price process and derive sharp bounds on the convergence rate, which are characterised by the singularity of the propagator. These kernel estimation methods extend existing methods from the area of Tikhonov regularisation for inverse problems and are of independent interest. The bound on the regret in the exploitation phase is obtained by deriving stability results for the optimizer and value function of the associated class of infinite-dimensional stochastic control problems. As a complementary result we propose a regression-based algorithm to estimate the conditional expectation of non-Markovian signals and derive its convergence rate.

Statistical Learning with Sublinear Regret of Propagator Models

TL;DR

This work addresses optimal execution under unknown transient price impact governed by a propagator kernel and a temporary impact parameter . It advances a nonparametric, kernel-based learning framework that alternates exploration and exploitation, using a regularised least-squares estimator to identify from visible prices and a price predictor signal. The authors prove high-probability, sublinear regret for a phased learning algorithm and derive convergence rates that depend on kernel regularity, including optimal rates for regular and power-law singular kernels. A rigorous Lipschitz stability analysis of the infinite-dimensional control problem under model misspecification enables quantitative bounds on the performance gap, with complementary numerical implementations illustrating the kernel estimation behavior for singular propagators.

Abstract

We consider a class of learning problems in which an agent liquidates a risky asset while creating both transient price impact driven by an unknown convolution propagator and linear temporary price impact with an unknown parameter. We characterize the trader's performance as maximization of a revenue-risk functional, where the trader also exploits available information on a price predicting signal. We present a trading algorithm that alternates between exploration and exploitation phases and achieves sublinear regrets with high probability. For the exploration phase we propose a novel approach for non-parametric estimation of the price impact kernel by observing only the visible price process and derive sharp bounds on the convergence rate, which are characterised by the singularity of the propagator. These kernel estimation methods extend existing methods from the area of Tikhonov regularisation for inverse problems and are of independent interest. The bound on the regret in the exploitation phase is obtained by deriving stability results for the optimizer and value function of the associated class of infinite-dimensional stochastic control problems. As a complementary result we propose a regression-based algorithm to estimate the conditional expectation of non-Markovian signals and derive its convergence rate.
Paper Structure (26 sections, 23 theorems, 166 equations, 2 figures, 1 algorithm)

This paper contains 26 sections, 23 theorems, 166 equations, 2 figures, 1 algorithm.

Key Result

Lemma 2.9

There exists $L,\sigma>0$ such that for all $p\ge 2$, ${\mathbb{E}}[(|M_0|^2 +\|M\|^2_{L^2([0,T])})^{p/2}]\le \frac{1}{2}p!\sigma^2 L^{p-2}$. Then Assumption assum:concentration_M holds with $C_M=2(L+\sigma)$.

Figures (2)

  • Figure 1: Comparison between the true power law kernels $G^\star(t)=t^{-\alpha}$, with $\alpha =0.1$ (in blue) and $\alpha =0.4$ (in orange), and the estimated kernels with different sample sizes $N$.
  • Figure 2: Mean relative errors of $G^N$ for different sample sizes plotted in solid lines and the intervals containing the errors denoted by the lighter regions (the plot is in a log-log scale). The true power law kernels $G^\star(t)=t^{-\alpha}$, with $\alpha =0.1$ (upper panel) and $\alpha =0.4$ (lower panel).

Theorems & Definitions (56)

  • Remark 2.1
  • Remark 2.3
  • Remark 2.4
  • Remark 2.5
  • Remark 2.7
  • Lemma 2.9
  • Theorem 2.10
  • Remark 2.11
  • Remark 2.12
  • Definition 2.14: Class of admissible parameters $\Xi_{\varepsilon}$
  • ...and 46 more