Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes

Chiara Amorino; Francisco Pina; Mark Podolskij

Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes

Chiara Amorino, Francisco Pina, Mark Podolskij

TL;DR

The paper tackles high-dimensional drift estimation for diffusion processes observed at discrete times under a sparsity assumption. It develops an oracle inequality for the Lasso estimator by controlling three key probabilistic events—martingale fluctuations, discretization bias, and a compatibility condition—and shows that discretization can be negligible under suitable sampling, recovering the optimal continuous-observation rate. Two models are analyzed: a general linear drift and a canonical OU process, with concentration inequalities tailored to each setting (martingale-based and Malliavin-calculus-based, respectively). Theoretical results are complemented by numerical experiments demonstrating superior support recovery of the Lasso over MLE in high dimensions. Overall, the work provides precise finite-sample-type error bounds and clear guidance on how sampling, dimension, and sparsity interact to determine convergence rates in discretely observed high-dimensional diffusions.

Abstract

In this paper, we address high-dimensional parametric estimation of the drift function in diffusion models, specifically focusing on a $d$-dimensional ergodic diffusion process observed at discrete time points. We consider both a general linear form for the drift function and the particular case of the Ornstein-Uhlenbeck (OU) process. Assuming sparsity of the parameter vector, we examine the statistical behavior of the Lasso estimator for the unknown parameter. Our primary contribution is the proof of an oracle inequality for the Lasso estimator, which holds on the intersection of three specific sets defined for our analysis. We carefully control the probability of these sets, tackling the central challenge of our study. This approach allows us to derive error bounds for the $l_1$ and $l_2$ norms, assessing the performance of the proposed Lasso estimator. Our results demonstrate that, under certain conditions, the discretization error becomes negligible, enabling us to achieve the same optimal rate of convergence as if the continuous trajectory of the process were observed. We validate our theoretical findings through numerical experiments, which show that the Lasso estimator significantly outperforms the maximum likelihood estimator (MLE) in terms of support recovery.

Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes

TL;DR

Abstract

In this paper, we address high-dimensional parametric estimation of the drift function in diffusion models, specifically focusing on a

-dimensional ergodic diffusion process observed at discrete time points. We consider both a general linear form for the drift function and the particular case of the Ornstein-Uhlenbeck (OU) process. Assuming sparsity of the parameter vector, we examine the statistical behavior of the Lasso estimator for the unknown parameter. Our primary contribution is the proof of an oracle inequality for the Lasso estimator, which holds on the intersection of three specific sets defined for our analysis. We carefully control the probability of these sets, tackling the central challenge of our study. This approach allows us to derive error bounds for the

and

norms, assessing the performance of the proposed Lasso estimator. Our results demonstrate that, under certain conditions, the discretization error becomes negligible, enabling us to achieve the same optimal rate of convergence as if the continuous trajectory of the process were observed. We validate our theoretical findings through numerical experiments, which show that the Lasso estimator significantly outperforms the maximum likelihood estimator (MLE) in terms of support recovery.

Paper Structure (26 sections, 20 theorems, 190 equations, 4 figures)

This paper contains 26 sections, 20 theorems, 190 equations, 4 figures.

Introduction
Notation
Assumptions
Assumptions for the general linear drift
Assumptions for the OU case
Main results
Main results for the general linear case
Main results for the OU case
The concentration inequalities
Concentration inequality for the general linear drift model
Concentration inequality for the OU case
Numerical results
Conclusions and outlook
Proof of main results
Proof of the oracle inequality
...and 11 more sections

Key Result

Theorem 2.1

Assume that $\|\theta\|_0 = s$ and the drift function $b_{\theta}$ in model is differentiable with respect to $\theta$. Therefore, for any $\gamma>0$, on $\mathcal{T} \cap \mathcal{T'} \cap \mathcal{T"}$, it holds that where $\lambda, s$ and $k$ are the tuning parameter of the Lasso estimator, the sparsity of $\theta_0$ and the constant on $\mathcal{T}"$, respectively.

Figures (4)

Figure 1: Simulated sample paths of the diffusion process, shown across its different dimensions. Each graphic represents the evolution of the process for a concrete dimension.
Figure 2: Comparison of the sparse true parameter and the estimated parameter using MLE and Lasso Estimator.
Figure 3: $l_1$ mean error of the MLE and the Lasso estimator $\pm$ one standard deviation.
Figure 4: $l_2$ mean error of the MLE and the Lasso estimator $\pm$ one standard deviation.

Theorems & Definitions (48)

Remark 1.1
Remark 1.2
Remark 1.3
Remark 1.4
Theorem 2.1
Theorem 2.2
proof
Corollary 2.3
proof
Corollary 2.4
...and 38 more

Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes

TL;DR

Abstract

Sampling effects on Lasso estimation of drift functions in high-dimensional diffusion processes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (48)