Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

Can Pouliquen; Paulo Gonçalves; Mathurin Massias; Titouan Vayer

Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

Can Pouliquen, Paulo Gonçalves, Mathurin Massias, Titouan Vayer

TL;DR

This work addresses tuning the Graphical Lasso hyperparameters by casting it as a bilevel optimization problem and deriving the hypergradient via implicit differentiation. The core contribution is a closed-form Jacobian for the GLASSO solution with respect to scalar and matrix regularization parameters, obtained through a fixed-point differentiation of the proximal update and a careful handling of non-smoothness. The authors extend the scalar case to a matrix of hyperparameters, yielding a fourth-order Jacobian tensor and showing how to reuse a Kronecker-inverse to reduce computation. Empirical results on synthetic data demonstrate that the proposed first-order approach can match grid-search in the scalar case and that matrix regularization offers substantial performance gains, albeit with non-convexity challenges that motivate further optimization refinements.

Abstract

We provide a framework and algorithm for tuning the hyperparameters of the Graphical Lasso via a bilevel optimization problem solved with a first-order method. In particular, we derive the Jacobian of the Graphical Lasso solution with respect to its regularization hyperparameters.

Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

TL;DR

Abstract

Paper Structure (12 sections, 3 theorems, 21 equations, 3 figures)

This paper contains 12 sections, 3 theorems, 21 equations, 3 figures.

Introduction
Related work
Notation
The scalar case
Jacobian with respect to $Z$
Jacobian with respect to $\lambda$
Matrix of hyperparameters
Experiments
The criterion and its gradient
Computing the Jacobian
Comparison with grid-search
Matrix regularization

Key Result

Proposition 1

Let $\hat{\boldsymbol{\Theta}}(\lambda)$ be a solution of eq:graphical_lasso. Then, using Fermat's rule and the expression of the subdifferential of the $\ell_1$-norm beck2017first,

Figures (3)

Figure 1: Value of the criterion $\mathcal{C}$w.r.t.$\lambda$ for grid-search and our method, along with the oracle RE.
Figure 2: Outer objective value for the bilevel problem along iterations of hypergradient descent.
Figure 3: Visualization of the matrices $\mathbf{\Lambda}^\mathrm{opt}$, $\boldsymbol{\Theta}_\mathrm{true}$ and $\widehat{\boldsymbol{\Theta}}(\Lambda^\mathrm{opt})$.

Theorems & Definitions (4)

Proposition 1
Proposition 3
proof
Proposition 4

Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

TL;DR

Abstract

Implicit Differentiation for Hyperparameter Tuning the Weighted Graphical Lasso

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (4)