Table of Contents
Fetching ...

A variational approach to dimension-free self-normalized concentration

Ben Chugg, Aaditya Ramdas

TL;DR

The paper addresses self-normalized concentration for vector-valued processes under sub-$\psi$ tail conditions, aiming for dimension-free, determinant-based bounds that scale with $\log\det V_\tau$ rather than the dimension or condition number. It develops a variational (PAC-Bayes) framework that recovers classical sub-Gaussian results, then extends to general sub-$\psi$ processes via a line-crossing inequality and a stitching technique to achieve time-uniform bounds. It further derives self-normalized Bernstein and Bennett inequalities, and introduces an empirical Bernstein bound that adapts to unknown variance in a dimension-free setting. The results bridge determinant-based and condition-number-based bounds, enabling robust concentration results with practical implications for structured, ill-conditioned vector processes in areas like bandits, system identification, and time-series analysis.

Abstract

We study the self-normalized concentration of vector-valued stochastic processes. We focus on bounds for "sub-$ψ$" processes, a well-known and quite general class of process that encompasses a wide variety of well-known tail conditions (including sub-exponential, sub-Gaussian, sub-gamma, sub-Poisson, and several heavy-tailed settings without a moment generating function such as symmetric or bounded 2nd or 3rd moments). Our results recover and generalize the influential bound of de la Peña et al. [20] (proved again in Abbasi-Yadkori et al. [2]) in the sub-Gaussian case. Further, we fill a gap in the literature between determinant-based bounds and more recent bounds based on condition numbers. As applications we prove a Bernstein inequality for random vectors satisfying a moment condition (a more general condition than boundedness), and also provide the first dimension-free self-normalized empirical Bernstein inequality. Our techniques are based on the variational (PAC-Bayes) approach to concentration.

A variational approach to dimension-free self-normalized concentration

TL;DR

The paper addresses self-normalized concentration for vector-valued processes under sub- tail conditions, aiming for dimension-free, determinant-based bounds that scale with rather than the dimension or condition number. It develops a variational (PAC-Bayes) framework that recovers classical sub-Gaussian results, then extends to general sub- processes via a line-crossing inequality and a stitching technique to achieve time-uniform bounds. It further derives self-normalized Bernstein and Bennett inequalities, and introduces an empirical Bernstein bound that adapts to unknown variance in a dimension-free setting. The results bridge determinant-based and condition-number-based bounds, enabling robust concentration results with practical implications for structured, ill-conditioned vector processes in areas like bandits, system identification, and time-series analysis.

Abstract

We study the self-normalized concentration of vector-valued stochastic processes. We focus on bounds for "sub-" processes, a well-known and quite general class of process that encompasses a wide variety of well-known tail conditions (including sub-exponential, sub-Gaussian, sub-gamma, sub-Poisson, and several heavy-tailed settings without a moment generating function such as symmetric or bounded 2nd or 3rd moments). Our results recover and generalize the influential bound of de la Peña et al. [20] (proved again in Abbasi-Yadkori et al. [2]) in the sub-Gaussian case. Further, we fill a gap in the literature between determinant-based bounds and more recent bounds based on condition numbers. As applications we prove a Bernstein inequality for random vectors satisfying a moment condition (a more general condition than boundedness), and also provide the first dimension-free self-normalized empirical Bernstein inequality. Our techniques are based on the variational (PAC-Bayes) approach to concentration.

Paper Structure

This paper contains 27 sections, 16 theorems, 150 equations, 4 figures, 1 table.

Key Result

Proposition 2.2

Let $\Theta$ be a measurable parameter space. For each $\theta\in\Theta$, let $Z(\theta) \equiv (Z_t(\theta))_{t\geq 0}$ be a stochastic process upper bounded by a nonnegative supermartingale $L(\theta)\equiv (L_t(\theta))_{t\geq 0}$. Assume that all processes are adapted to the same filtration $\ma

Figures (4)

  • Figure 1: Left: The growth of $g_t$ for various $\psi$ functions and across various values of $\lambda$. For sub-gamma, sub-Poisson, and sub-exponential we fix $c=1/4$. Right: A comparison of Theorem \ref{['thm:sub-gaussian']} (dotted black line) and Theorem \ref{['thm:sub-psi']} (various colored lines) instantiated for $\psi=\psi_N$. In both figures we use $U_0 = I_d$ and $V_t = \sum_{k\leq t}X_s X_s^\intercal$ where the vectors $X_t$ are chosen based on a bandit algorithm. Details may be found in Appendix \ref{['app:experimental-details']}.
  • Figure 2: A comparison of Theorem \ref{['thm:sub-psi']} with the bound of whitehouse2023time, which we recall is based on the condition number of $V_\tau$. We control the growth of the determinant with rank-$k$ updates: as $k$ grows, so does $\det V_\tau$. The condition number has the same growth in each case. As the sample size grows, Theorem \ref{['thm:sub-psi']} outperforms the condition number-based bound for smaller values of $\det V_\tau$, though the performance depends substantially on the choice of $\lambda$. It loses ground as the determinant grows relative to $d \log \kappa(V_\tau)$. We use $d = 20, c=1$ and $U_0 = I_d$. Full simulation details can be found in Appendix \ref{['app:experimental-details']}.
  • Figure 3: Left: Comparison of $\psi_{E,1}$, $\psi_{G,1}$, and $\psi_{P,1}$. As shown in Lemma \ref{['lem:psiE<=psiG']}, $\psi_{G,1}$ dominates $\psi_{E,1}$ for all $\lambda\in[0,1)$. Right: Comparison of $\psi_{P,1}^{-1}$ and $\psi_{G,1}^{-1}$, which define the bounds for $\lambda$ in Corollary \ref{['cor:bennett-and-bernstein']}.
  • Figure 4: Comparison of our empirical Bernstein bound, Corollary \ref{['cor:empirical-bernstein']} (blue line) , with the empirical Bernstein bound of whitehouse2023time (dotted line). Here each $X_t$ is generated to have random uniform noise in $k$ directions, so $V_{t+1} = V_t + (X_{t+1} - \widehat{\mu}_{t})(X_{t+1} - \widehat{\mu}_{t})^\top$ acts roughly as a rank-$k$ update. As $k$ grows, the determinant grows relative to the condition number, thus our determinant-based bound deteriorates with respect to the condition number bound. Simulation details can be found in Appendix \ref{['app:experimental-details']}.

Theorems & Definitions (26)

  • Definition 2.1: Sub-$\psi$ process in $\mathbb{R}^d$
  • Proposition 2.2: Variational Template
  • Theorem 3.1
  • Theorem 4.1
  • Remark 4.2
  • Remark 4.3
  • Theorem 4.4
  • Remark 4.5
  • Remark 4.6
  • Example 4.7
  • ...and 16 more