Table of Contents
Fetching ...

Online Quantile Regression

Yinan Shen, Dong Xia, Wen-Xin Zhou

TL;DR

The paper tackles online quantile regression under sequential data with unknown horizon and limited memory. It develops an adaptive online stochastic sub-gradient method on the check loss $\rho_{Q,\tau}$ and analyzes three data-arrival regimes (online, batch, and infinite storage), achieving horizon-agnostic, minimax-optimal error rates with exponential tail guarantees. A two-phase stepsize strategy is shown to mitigate initialization effects, yielding fast linear convergence initially and near-optimal $O(d/T)$ error as more data arrive, even under heavy-tailed noise. The results demonstrate robust, scalable QR in streaming contexts, offering strong theoretical guarantees and practical efficiency compared to offline methods, with extensions to broader online regression settings discussed.

Abstract

This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analysis, we unveil the delicate interplay between updating iterates based on individual observations versus batches of observations, revealing distinct regularity properties in each scenario. Our method ensures long-term optimal estimation irrespective of the chosen update strategy. Importantly, our contributions go beyond prior works by achieving exponential-type concentration inequalities and attaining optimal regret and error rates that exhibit only \textsf{ short-term} sensitivity to initial errors. A key insight from our study is the delicate statistical analyses and the revelation that appropriate stepsize schemes significantly mitigate the impact of initial errors on subsequent errors and regrets. This underscores the robustness of stochastic sub-gradient descent in handling initial uncertainties, emphasizing its efficacy in scenarios where the sequential arrival of data introduces uncertainties regarding both the horizon and the total number of observations. Additionally, when the initial error rate is well-controlled, there is a trade-off between short-term error rate and long-term optimality. Due to the lack of delicate statistical analysis for squared loss, we also briefly discuss its properties and proper schemes. Extensive simulations support our theoretical findings.

Online Quantile Regression

TL;DR

The paper tackles online quantile regression under sequential data with unknown horizon and limited memory. It develops an adaptive online stochastic sub-gradient method on the check loss and analyzes three data-arrival regimes (online, batch, and infinite storage), achieving horizon-agnostic, minimax-optimal error rates with exponential tail guarantees. A two-phase stepsize strategy is shown to mitigate initialization effects, yielding fast linear convergence initially and near-optimal error as more data arrive, even under heavy-tailed noise. The results demonstrate robust, scalable QR in streaming contexts, offering strong theoretical guarantees and practical efficiency compared to offline methods, with extensions to broader online regression settings discussed.

Abstract

This paper addresses the challenge of integrating sequentially arriving data within the quantile regression framework, where the number of features is allowed to grow with the number of observations, the horizon is unknown, and memory is limited. We employ stochastic sub-gradient descent to minimize the empirical check loss and study its statistical properties and regret performance. In our analysis, we unveil the delicate interplay between updating iterates based on individual observations versus batches of observations, revealing distinct regularity properties in each scenario. Our method ensures long-term optimal estimation irrespective of the chosen update strategy. Importantly, our contributions go beyond prior works by achieving exponential-type concentration inequalities and attaining optimal regret and error rates that exhibit only \textsf{ short-term} sensitivity to initial errors. A key insight from our study is the delicate statistical analyses and the revelation that appropriate stepsize schemes significantly mitigate the impact of initial errors on subsequent errors and regrets. This underscores the robustness of stochastic sub-gradient descent in handling initial uncertainties, emphasizing its efficacy in scenarios where the sequential arrival of data introduces uncertainties regarding both the horizon and the total number of observations. Additionally, when the initial error rate is well-controlled, there is a trade-off between short-term error rate and long-term optimality. Due to the lack of delicate statistical analysis for squared loss, we also briefly discuss its properties and proper schemes. Extensive simulations support our theoretical findings.
Paper Structure (23 sections, 19 theorems, 173 equations, 9 figures, 1 table)

This paper contains 23 sections, 19 theorems, 173 equations, 9 figures, 1 table.

Key Result

Theorem 1

Under Assumptions assump:sensing_operators:vec and assump:heavy-tailed (a), there exist universal positive constants $c_0,c_1,c_2,c_3,C_0$ such that, for an arbitrary initialization ${\boldsymbol{\beta}}_{0} \in{\mathbb R}^d$, the sequence $\{{\boldsymbol{\beta}}_t\}_{t\geq 1}$ generated by the onl

Figures (9)

  • Figure 1: Expected excess risk of the objective function.$Y$-axis: lower bound of ${\mathbb E}[f_t({\boldsymbol{\beta}})-f_t({\boldsymbol{\beta}}^*) ]$; $X$-axis: the value of $\|{\boldsymbol{\beta}}-{\boldsymbol{\beta}}^{\ast}\|_2$. It shows that the lower bound varies from a linear to quadratic dependence on $\|{\boldsymbol{\beta}}-{\boldsymbol{\beta}}^{\ast}\|_2$ as ${\boldsymbol{\beta}}$ gets closer to ${\boldsymbol{\beta}}^*$.
  • Figure 2: Trade-off between short-term accuracy and long-term optimality. The initialization ${\boldsymbol{\beta}}_0=\boldsymbol{0}$ is already in proximity to ${\boldsymbol{\beta}}^{\ast}$, enabling the online sub-gradient descent algorithm to bypass the first phase and start the second phase iterations immediately. A small stepsize can ensure short-term accuracy but may compromise losing long-term optimality, whereas a large stepsize can guarantee long-term optimality but at the expense of short-term accuracy. Here, the dimension is set to $d=100$, and the noise follows a Student $t_{\nu}$ distribution with $\nu=1.1$.
  • Figure 3: Expected length of the sub-gradient for the empirical quantile loss. $Y$-axis: upper bound of ${\mathbb E}[\| {\mathbf g} \|_{2}^2|{\boldsymbol{\beta}}]$, where ${\mathbf g}\in\partial \frac{1}{n}\sum_{i=1}^{n} \rho_{Q,\tau}(Y_i-{\mathbf X}_i^{\top}{\boldsymbol{\beta}})$; $X$-axis: $\|{\boldsymbol{\beta}}-{\boldsymbol{\beta}}^{\ast}\|_2$. It reveals three phases of properties associated with the sub-gradient, depending on the closeness between ${\boldsymbol{\beta}}$ and ${\boldsymbol{\beta}}^{\ast}$.
  • Figure 4: Relative error versus time/iterations in online (one-sample) learning ($n_t\equiv 1$). The dimension $d=100$, unknown horizon $T=10^5$, and quantile loss parameter $\tau=1/2$. The convergence performances of online sub-gradient descent are examined under three stepsize schemes: Statistical stands for our stepsize scheme guided by Theorem \ref{['thm:one_sample']}, Constant stands for the stepsize scheme $\eta_t\equiv \emph{const}$zhang2004solving, $O(1/t)$ means the decaying stepsize scheme $\eta_t=O(1/t)$duchi2009efficient. Left(a): moderate SNR; right(b): strong SNR.
  • Figure 5: Error rates of offline and online regression using quantile loss $\rho_{Q,\tau}(\cdot)$. Online One Sample refers to the online learning algorithm studied in Section \ref{['sec:onesample']}. The dimension $d=100$, total sample size $n=20,000$, the batch size $n_t\equiv 100$, and noise has a $t_{1.1}$ distribution. Box-plots are drawn based on 50 independent simulations.
  • ...and 4 more figures

Theorems & Definitions (27)

  • Theorem 1
  • Corollary 1
  • Remark 1: Trade-off between short-term accuracy and long-term optimality
  • Theorem 2
  • Theorem 3
  • Corollary 2
  • Theorem 4
  • Theorem 5
  • Corollary 3
  • Lemma 1
  • ...and 17 more