Table of Contents
Fetching ...

A Novel Framework for Online Supervised Learning with Feature Selection

Lizhe Sun, Mingyuan Wang, Siquan Zhu, Adrian Barbu

TL;DR

A novel framework for online learning based on running averages is proposed, and many popular offline regularised methods such as Lasso, Elastic Net, Minimax Concave Penalty, and Feature Selection with Annealing have their online versions introduced in this framework.

Abstract

Current online learning methods suffer issues such as lower convergence rates and limited capability to select important features compared to their offline counterparts. In this paper, a novel framework for online learning based on running averages is proposed. Many popular offline regularized methods such as Lasso, Elastic Net, Minimax Concave Penalty (MCP), and Feature Selection with Annealing (FSA) have their online versions introduced in this framework. The equivalence between the proposed online methods and their offline counterparts is proved, and then novel theoretical true support recovery and convergence guarantees are provided for some of the methods in this framework. Numerical experiments indicate that the proposed methods enjoy high true support recovery accuracy and a faster convergence rate compared with conventional online and offline algorithms. Finally, applications to large datasets are presented, where again the proposed framework shows competitive results compared to popular online and offline algorithms.

A Novel Framework for Online Supervised Learning with Feature Selection

TL;DR

A novel framework for online learning based on running averages is proposed, and many popular offline regularised methods such as Lasso, Elastic Net, Minimax Concave Penalty, and Feature Selection with Annealing have their online versions introduced in this framework.

Abstract

Current online learning methods suffer issues such as lower convergence rates and limited capability to select important features compared to their offline counterparts. In this paper, a novel framework for online learning based on running averages is proposed. Many popular offline regularized methods such as Lasso, Elastic Net, Minimax Concave Penalty (MCP), and Feature Selection with Annealing (FSA) have their online versions introduced in this framework. The equivalence between the proposed online methods and their offline counterparts is proved, and then novel theoretical true support recovery and convergence guarantees are provided for some of the methods in this framework. Numerical experiments indicate that the proposed methods enjoy high true support recovery accuracy and a faster convergence rate compared with conventional online and offline algorithms. Finally, applications to large datasets are presented, where again the proposed framework shows competitive results compared to popular online and offline algorithms.

Paper Structure

This paper contains 20 sections, 7 theorems, 44 equations, 6 figures, 6 tables, 3 algorithms.

Key Result

Proposition 3.1

Consider the general penalized regression problem in which ${\boldsymbol{\beta}} \in \mathbb{R}^p$ is the coefficient vector and $\mathbf{P}({{\boldsymbol{\beta}}}; \lambda) = \sum_{j=1}^{p}\mathbf{P}(\beta_j ; \lambda)$ is a penalty function. It is equivalent to the online optimization problem based on RAVEs by

Figures (6)

  • Figure 1: Diagram of the running averages-based methods. The RAVEs are updated as the data is received. The model can be extracted from the running average statistics at any time.
  • Figure 2: Plot of the variable detection rate (DR) vs $n$ in regression, for $k=50$ true variables. Top: strong signal $\beta=1$, bottom: weak signal $\beta=0.01$.
  • Figure 3: Plot of the variable detection rate (DR) vs $n$ in classification, for $k=50$ true variables. Top: strong signal $\beta=1$, bottom: weak signal $\beta=0.01$.
  • Figure 4: Regret figures are presented for TSGD, SADMM, and running averages-based methods, averaged over 20 runs. Left: strong signal ($\beta = 1$), middle: medium signal ($\beta = 0.1$), right: weak signal ($\beta = 0.01$).
  • Figure 5: Model adaptation experiment. Upper left: true signal. Upper right: estimated parameters without adaptation. Bottom left: RMSE for prediction. Bottom right: estimated parameters with adaptation.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Proposition 3.1
  • proof
  • Remark 3.1
  • Theorem 4.1
  • Remark 4.1
  • Definition 4.1
  • Proposition 4.1
  • Theorem 4.2
  • Corollary 4.1
  • Proposition 4.2
  • ...and 1 more