Table of Contents
Fetching ...

Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling

Skyler Wu, Fred Lu, Edward Raff, James Holt

TL;DR

A weighted reservoir sampling (WRS) approach to obtain a stable ensemble model from the sequence of solutions without requiring additional passes over the data, hold-out sets, or a growing amount of memory, and shows that the risk of the ensemble classifier is bounded with respect to the regret of the underlying online learning method.

Abstract

Online learning methods, like the seminal Passive-Aggressive (PA) classifier, are still highly effective for high-dimensional streaming data, out-of-core processing, and other throughput-sensitive applications. Many such algorithms rely on fast adaptation to individual errors as a key to their convergence. While such algorithms enjoy low theoretical regret, in real-world deployment they can be sensitive to individual outliers that cause the algorithm to over-correct. When such outliers occur at the end of the data stream, this can cause the final solution to have unexpectedly low accuracy. We design a weighted reservoir sampling (WRS) approach to obtain a stable ensemble model from the sequence of solutions without requiring additional passes over the data, hold-out sets, or a growing amount of memory. Our key insight is that good solutions tend to be error-free for more iterations than bad solutions, and thus, the number of passive rounds provides an estimate of a solution's relative quality. Our reservoir thus contains $K$ previous intermediate weight vectors with high survival times. We demonstrate our WRS approach on the Passive-Aggressive Classifier (PAC) and First-Order Sparse Online Learning (FSOL), where our method consistently and significantly outperforms the unmodified approach. We show that the risk of the ensemble classifier is bounded with respect to the regret of the underlying online learning method.

Stabilizing Linear Passive-Aggressive Online Learning with Weighted Reservoir Sampling

TL;DR

A weighted reservoir sampling (WRS) approach to obtain a stable ensemble model from the sequence of solutions without requiring additional passes over the data, hold-out sets, or a growing amount of memory, and shows that the risk of the ensemble classifier is bounded with respect to the regret of the underlying online learning method.

Abstract

Online learning methods, like the seminal Passive-Aggressive (PA) classifier, are still highly effective for high-dimensional streaming data, out-of-core processing, and other throughput-sensitive applications. Many such algorithms rely on fast adaptation to individual errors as a key to their convergence. While such algorithms enjoy low theoretical regret, in real-world deployment they can be sensitive to individual outliers that cause the algorithm to over-correct. When such outliers occur at the end of the data stream, this can cause the final solution to have unexpectedly low accuracy. We design a weighted reservoir sampling (WRS) approach to obtain a stable ensemble model from the sequence of solutions without requiring additional passes over the data, hold-out sets, or a growing amount of memory. Our key insight is that good solutions tend to be error-free for more iterations than bad solutions, and thus, the number of passive rounds provides an estimate of a solution's relative quality. Our reservoir thus contains previous intermediate weight vectors with high survival times. We demonstrate our WRS approach on the Passive-Aggressive Classifier (PAC) and First-Order Sparse Online Learning (FSOL), where our method consistently and significantly outperforms the unmodified approach. We show that the risk of the ensemble classifier is bounded with respect to the regret of the underlying online learning method.

Paper Structure

This paper contains 28 sections, 11 theorems, 30 equations, 30 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Let $\mathbf{w}^{(1)}, \ldots, \mathbf{w}^{(k)} \in \mathcal{R}$ be the updated outputs of an online PA algorithm on inputs $Z_1^T \sim \mathcal{D}$. Also define $r_m$ as the minimal achievable risk of any model, such that $r_m \leq R_\mathcal{D}(\tilde{\mathbf{w}})$ almost surely. Then

Figures (30)

  • Figure 1: Test accuracies ($y$-axis) over timestep ($x$-axis) for PAC-WRS and FSOL-WRS on Avazu (App) and News20. Light grey lines: test accuracies of the baseline methods --- PAC or FSOL --- at each timestep. Solid black lines: test accuracies of the "oracle" models, computed as the cumulative maximum of the baselines. Solid blue lines: test accuracies of WRS-enhanced models. Note massive fluctuations of grey lines and stability of blue lines. All variants shown are using standard sampling weights for WRS, with simple-averaging.
  • Figure 2: Relative oracle performances ($y$-axis) of base PAC and PAC-WRS using standard weights over reservoir sizes $K$ ($x$-axis) on 3 representative datasets. Error bars represent the minimum and maximum values achieved across 5 randomized trials. Blue: WRS-augmented variants via simple average of reservoir members. Red: WRS-augmented variants via weighted average of reservoir members. Dotted lines: indicates voting-based zeroing was performed for additional sparsity. Lower values indicate more stable performance.
  • Figure 3: Final test accuracies ($y$-axis) of base FSOL and FSOL-WRS using standard weights over reservoir sizes $K$ ($x$-axis) on 3 representative datasets. Error bars represent the minimum and maximum values achieved across 5 randomized trials. See Figure \ref{['fig:exhibition-ROP']} for legend description.
  • Figure 4: Final sparsities ($y$-axis) of base FSOL and FSOL-WRS using standard weights over reservoir sizes $K$ ($x$-axis) on 3 representative datasets. Error bars represent the minimum and maximum values achieved across 5 randomized trials. See Figure \ref{['fig:exhibition-ROP']} for legend description.
  • Figure 5: Test accuracies ($y$-axis) over timestep ($x$-axis) for PAC/FSOL and PAC/FSOL-WRS on the EMBER benchmark dataset for malware classification.
  • ...and 25 more figures

Theorems & Definitions (20)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • proof
  • Theorem 6
  • proof
  • Remark
  • ...and 10 more