Table of Contents
Fetching ...

Adaptive sieving with semismooth Newton proximal augmented Lagrangian algorithm for multi-task Lasso problems

Lanyu Lin, Yong-Jin Liu, Bo Wang, Junfeng Yang

TL;DR

This work tackles the multi-task Lasso problem with an $\ell_{1,\infty}$ constraint, formulating it as $\min_W \tfrac{1}{2}\sum_{i=1}^n \|y^i - X^i w^i\|^2$ s.t. $\|W\|_{1,\infty} \le \gamma$. It introduces an adaptive sieving (AS) strategy to generate a solution path by screening inactive features and solving a sequence of reduced problems with an inexact semismooth Newton proximal augmented Lagrangian (Ssnpal) method, yielding asymptotically superlinear convergence. The inner problems are solved via a proximal augmented Lagrangian framework combined with a semismooth Newton method that leverages the generalized HS-Jacobian of the projection onto the $\ell_{1,\infty}$-norm ball, enabling fast convergence on small subproblems. Theoretical results guarantee finite termination of the AS path and convergence of the inner solver, while extensive experiments on synthetic and real data show that AS- Ssnpal achieves substantial speedups and robustness relative to state-of-the-art solvers such as Ssnpal, AS-ADMM, and ADMM.

Abstract

Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the $\ell_{1,\infty}$-norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each subproblem along the path is solved via an inexact semismooth Newton proximal augmented Lagrangian ({\sc Ssnpal}) algorithm, achieving an asymptotically superlinear convergence rate. By exploiting the Karush-Kuhn-Tucker (KKT) conditions and the inherent sparsity of multi-task Lasso solutions, the {\sc Ssnpal} algorithm solves a sequence of reduced subproblems with small dimensions. This approach enables our method to scale effectively to large problems. Numerical experiments on synthetic and real-world datasets demonstrate the superior efficiency and robustness of our algorithm compared to state-of-the-art solvers.

Adaptive sieving with semismooth Newton proximal augmented Lagrangian algorithm for multi-task Lasso problems

TL;DR

This work tackles the multi-task Lasso problem with an constraint, formulating it as s.t. . It introduces an adaptive sieving (AS) strategy to generate a solution path by screening inactive features and solving a sequence of reduced problems with an inexact semismooth Newton proximal augmented Lagrangian (Ssnpal) method, yielding asymptotically superlinear convergence. The inner problems are solved via a proximal augmented Lagrangian framework combined with a semismooth Newton method that leverages the generalized HS-Jacobian of the projection onto the -norm ball, enabling fast convergence on small subproblems. Theoretical results guarantee finite termination of the AS path and convergence of the inner solver, while extensive experiments on synthetic and real data show that AS- Ssnpal achieves substantial speedups and robustness relative to state-of-the-art solvers such as Ssnpal, AS-ADMM, and ADMM.

Abstract

Multi-task learning enhances model generalization by jointly learning from related tasks. This paper focuses on the -norm constrained multi-task learning problem, which promotes a shared feature representation while inducing sparsity in task-specific parameters. We propose an adaptive sieving (AS) strategy to efficiently generate a solution path for multi-task Lasso problems. Each subproblem along the path is solved via an inexact semismooth Newton proximal augmented Lagrangian ({\sc Ssnpal}) algorithm, achieving an asymptotically superlinear convergence rate. By exploiting the Karush-Kuhn-Tucker (KKT) conditions and the inherent sparsity of multi-task Lasso solutions, the {\sc Ssnpal} algorithm solves a sequence of reduced subproblems with small dimensions. This approach enables our method to scale effectively to large problems. Numerical experiments on synthetic and real-world datasets demonstrate the superior efficiency and robustness of our algorithm compared to state-of-the-art solvers.

Paper Structure

This paper contains 15 sections, 5 theorems, 87 equations, 6 algorithms.

Key Result

Theorem 2.1

Let $b, q\in \mathbb{R}^{nd}$ and $\gamma>0$ be given. Denote Then, the matrix $N_0$ given in (equa:N0) admits the closed-form expression: where $G\in\mathbb{R}^{nd\times nd}$ and $f\in\mathbb{R}^{nd}$ are given by for $i, j=1,\dots,nd$, and

Theorems & Definitions (8)

  • Theorem 2.1
  • Proposition 3.1
  • proof
  • Theorem 3.1
  • proof
  • Theorem 4.1
  • Theorem 4.2
  • proof