Table of Contents
Fetching ...

Time-Uniform Self-Normalized Concentration for Vector-Valued Processes

Justin Whitehouse, Zhiwei Steven Wu, Aaditya Ramdas

TL;DR

This work develops time-uniform self-normalized concentration bounds for vector-valued martingales under a general sub-$\psi$ tail condition, extending scalar results to $\mathbb{R}^d$ via directional projections and a geometric sphere-covering argument. The main contributions include a scalar bound $S_t \lesssim V_t (\psi^*)^{-1}(\frac{1}{V_t}\log\log V_t)$, a corresponding vector bound on $\| (V_t)^{-1/2} S_t \|$, and a tight multivariate law of the iterated logarithm with explicit dependence on $\gamma_{\max}(V_t)$ and $\kappa(V_t)$. The results yield non-asymptotic, time-uniform confidence ellipsoids for online linear regression under sub-$\psi$ noise, a multivariate empirical Bernstein inequality for bounded vectors, and extend to vector autoregressive models, offering practical tools for sequential estimation under heavy-tailed or dependent noise. By providing closed-form bounds with controllable constants, the framework broadens applicability beyond sub-Gaussian settings and enables robust analysis of adaptive statistical procedures in online and time-series contexts.

Abstract

Self-normalized processes arise naturally in many learning-related tasks. While self-normalized concentration has been extensively studied for scalar-valued processes, there are few results for multidimensional processes outside of the sub-Gaussian setting. In this work, we construct a general, self-normalized inequality for multivariate processes that satisfy a simple yet broad sub-$ψ$ tail condition, which generalizes assumptions based on cumulant generating functions. From this general inequality, we derive an upper law of the iterated logarithm for sub-$ψ$ vector-valued processes, which is tight up to small constants. We show how our inequality can be leveraged to derive a variety of novel, self-normalized concentration inequalities under both light and heavy-tailed observations. Further, we provide applications in prototypical statistical tasks, such as parameter estimation in online linear regression, autoregressive modeling, and bounded mean estimation via a new (multivariate) empirical Bernstein concentration inequality.

Time-Uniform Self-Normalized Concentration for Vector-Valued Processes

TL;DR

This work develops time-uniform self-normalized concentration bounds for vector-valued martingales under a general sub- tail condition, extending scalar results to via directional projections and a geometric sphere-covering argument. The main contributions include a scalar bound , a corresponding vector bound on , and a tight multivariate law of the iterated logarithm with explicit dependence on and . The results yield non-asymptotic, time-uniform confidence ellipsoids for online linear regression under sub- noise, a multivariate empirical Bernstein inequality for bounded vectors, and extend to vector autoregressive models, offering practical tools for sequential estimation under heavy-tailed or dependent noise. By providing closed-form bounds with controllable constants, the framework broadens applicability beyond sub-Gaussian settings and enables robust analysis of adaptive statistical procedures in online and time-series contexts.

Abstract

Self-normalized processes arise naturally in many learning-related tasks. While self-normalized concentration has been extensively studied for scalar-valued processes, there are few results for multidimensional processes outside of the sub-Gaussian setting. In this work, we construct a general, self-normalized inequality for multivariate processes that satisfy a simple yet broad sub- tail condition, which generalizes assumptions based on cumulant generating functions. From this general inequality, we derive an upper law of the iterated logarithm for sub- vector-valued processes, which is tight up to small constants. We show how our inequality can be leveraged to derive a variety of novel, self-normalized concentration inequalities under both light and heavy-tailed observations. Further, we provide applications in prototypical statistical tasks, such as parameter estimation in online linear regression, autoregressive modeling, and bounded mean estimation via a new (multivariate) empirical Bernstein concentration inequality.
Paper Structure (27 sections, 21 theorems, 105 equations, 4 figures)

This paper contains 27 sections, 21 theorems, 105 equations, 4 figures.

Key Result

Proposition 2.4

Suppose $(S_t, V_t)_{t \geq 0}$ is sub-$\psi$ with (inherently with respect to some filtration $(\mathcal{F}_t)_{t \geq 0}$). Then, for any fixed $\rho >0$,

Figures (4)

  • Figure 1: Comparing the four CGF-like functions $\psi_N, \psi_{P, c}, \psi_{E, c}$, and $\psi_{G, c}$ discussed throughout this section. The first figure illustrates implications amongst sub-$\psi$ processes: sub-$\psi_{N} \Rightarrow$ sub-$\psi_{P, c} \Rightarrow$ sub-$\psi_{E, c} \Rightarrow$ sub-$\psi_{G, c}$. That is, of all the CGFs considered, $\psi_N$ represents the lightest tails and $\psi_{G, c}$ the heaviest --- sub-Gaussian processes are sub-Gamma but not vice versa. The second figure illustrates this by plotting $\psi(\lambda)$ for $\lambda \in [0, 1)$ and with $c = 1$.
  • Figure 2: Comparing the boundary of Theorem \ref{['thm:scalar']} in the case $\psi = \psi_{P, c}$ with the boundary of Theorem 1 in howard2021time, recapped in \ref{['eq:bdry_gamma']}. Note that to apply the boundary of howard2021time, we need to leverage the fact that a sub-$\psi_{P, c}$ process $(S_t)_{t \geq 0}$ is also sub-$\psi_{G, c}$ with the same variance proxy $(V_t)_{t \geq 0}$. We have made the parameter selection $c = 1$, $\delta = 0.01$, $\rho = 1$, and $h(k) = (1 + k)^2\zeta(2)$, and have correspondingly varied $\alpha$ over several values. We see that for reasonably small choices of intrinsic time spacing $\alpha > 1$, our boundary is tighter than that of howard2021time. Thus, we see that although a sub-$\psi_{P, c}$ process can be viewed as a sub-$\psi_{G, c}$ process, this conversion introduces looseness, making our time-uniform concentration result generally preferable in this setting.
  • Figure 3: Comparing the boundary of Theorem \ref{['thm:scalar']} in the case $\psi = \psi_{G, c}$ with the boundary of howard2021time (presented in Equation \ref{['eq:bdry_gamma']}). We have made the parameter selection $c = 1$, $\delta = 0.01$, $\rho = 1$, and $h(k) = (1 + k)^2\zeta(2)$, and have correspondingly varied $\alpha$ over several values. As expected from our discussion, our boundary is looser than that of howard2021time for all values of $\alpha$, with the gap between the boundaries vanishing as the geometric spacing $\alpha$ of variance/intrinsic time is decreased towards 1. Since $\alpha = 1.01$ or $\alpha = 1.05$ are reasonable choices for applying our concentration inequalities, our bounds are just as applicable as those of howard2021time even in the sub-Gamma setting.
  • Figure 4: A comparison of the bounds on $|\widehat{a}_t - a|$ provided by Fact \ref{['fact:ar_model']} and Corollary \ref{['cor:auto']}. In plotting both bounds, we have fixed the failure probability as $\delta = 0.01$. We have numerically solved for $x$ such that the right hand side of Fact \ref{['fact:ar_model']} is equal to the target failure probability. When applying Corollary \ref{['cor:auto']}, we have set $\alpha = 1.5, h(k) = (1 + k)^2\zeta(2), \rho = 1,$ and note that dependence on $\epsilon$ and $\beta$ can be removed in the univariate case. In Subfigure \ref{['fig:auto_compare']}\ref{['fig:auto_compare:a']}, we plot Fact \ref{['fact:ar_model']} point-wise (i.e. we set the failure probability to be $\delta$ for each sample size $t$), and in Subfigure \ref{['fig:auto_compare']}\ref{['fig:auto_compare:b']}, we take a union bound over samples, setting the failure probability to be $\frac{6\delta}{t^2\pi^2}$ for each $t$.

Theorems & Definitions (40)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Proposition 2.4
  • Theorem 3.1
  • Corollary 3.2
  • Corollary 3.3
  • Theorem 4.1
  • Corollary 4.2
  • Corollary 4.5
  • ...and 30 more