Table of Contents
Fetching ...

Safe Screening Rules for Group OWL Models

Runxue Bao, Quanchao Lu, Yanfu Zhang

TL;DR

This paper is the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks.

Abstract

Group Ordered Weighted $L_{1}$-Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.

Safe Screening Rules for Group OWL Models

TL;DR

This paper is the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks.

Abstract

Group Ordered Weighted -Norm (Group OWL) regularized models have emerged as a useful procedure for high-dimensional sparse multi-task learning with correlated features. Proximal gradient methods are used as standard approaches to solving Group OWL models. However, Group OWL models usually suffer huge computational costs and memory usage when the feature size is large in the high-dimensional scenario. To address this challenge, in this paper, we are the first to propose the safe screening rule for Group OWL models by effectively tackling the structured non-separable penalty, which can quickly identify the inactive features that have zero coefficients across all the tasks. Thus, by removing the inactive features during the training process, we may achieve substantial computational gain and memory savings. More importantly, the proposed screening rule can be directly integrated with the existing solvers both in the batch and stochastic settings. Theoretically, we prove our screening rule is safe and also can be safely applied to the existing iterative optimization algorithms. Our experimental results demonstrate that our screening rule can effectively identify the inactive features and leads to a significant computational speedup without any loss of accuracy.

Paper Structure

This paper contains 20 sections, 2 theorems, 21 equations, 4 figures, 1 table, 2 algorithms.

Key Result

Corollary 1

Let $\Theta$ be any feasible dual, we have: where $G(B, \Theta) = P(B) - D(\Theta)$ is the intermediate duality gap during the training process.

Figures (4)

  • Figure 1: Running time of the algorithms without and with safe screening for Group OWL regression.
  • Figure 2: The screening rate of our screening rule in the batch and stochastic settings for Group OWL regression.
  • Figure 3: Running time of the algorithms without and with safe screening for multinomial OWL regression.
  • Figure 4: The screening rate of our screening rule in both batch and stochastic settings for multinomial OWL regression.

Theorems & Definitions (5)

  • Remark 1
  • Corollary 1
  • proof
  • Theorem 2
  • proof