Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Can Pouliquen; Mathurin Massias; Titouan Vayer

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Can Pouliquen, Mathurin Massias, Titouan Vayer

TL;DR

This work tackles the problem of estimating symmetric positive-definite (SPD) matrices while enforcing additional structure such as elementwise sparsity. It introduces SpodNet, an SPD-to-SPD learning module that preserves SPD outputs via Schur's condition and supports arbitrary constraints through learned updates of column-row pairs and diagonals. The framework is instantiated in three architectures—UBG, PNP, and E2E—each motivated by proximal-block-coordinate descent, and it includes a stability-preserving update strategy. Empirically, SpodNet achieves competitive or superior performance to traditional estimators in sparse precision-matrix recovery and yields coherent graphs in unsupervised real-world graph learning, highlighting its potential for practical SPD-constrained learning tasks.

Abstract

Estimating matrices in the symmetric positive-definite (SPD) cone is of interest for many applications ranging from computer vision to graph learning. While there exist various convex optimization-based estimators, they remain limited in expressivity due to their model-based approach. The success of deep learning motivates the use of learning-based approaches to estimate SPD matrices with neural networks in a data-driven fashion. However, designing effective neural architectures for SPD learning is challenging, particularly when the task requires additional structural constraints, such as element-wise sparsity. Current approaches either do not ensure that the output meets all desired properties or lack expressivity. In this paper, we introduce SpodNet, a novel and generic learning module that guarantees SPD outputs and supports additional structural constraints. Notably, it solves the challenging task of learning jointly SPD and sparse matrices. Our experiments illustrate the versatility and relevance of SpodNet layers for such applications.

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

TL;DR

Abstract

Paper Structure (38 sections, 2 theorems, 13 equations, 9 figures, 1 algorithm)

This paper contains 38 sections, 2 theorems, 13 equations, 9 figures, 1 algorithm.

Introduction
Notation
Related works
Riemannian approaches
SPD layers
Unrolled neural architectures
The SpodNet framework
Algorithmic foundations
Improving the update complexity
Using SpodNet to learn sparse precision matrices
Inferring sparse precision matrices
SpodNet for sparse SPD learning
Unrolled Block Graphical-ISTA (UBG)
Plug-and-Play Block Graphical-ISTA (PNP)
End-to-end updates (E2E)
...and 23 more sections

Key Result

Proposition 3.1

Suppose the updated column-row pair is the last one ($i = p)$. We partition $\bm{\Theta}$ as (for a generic column $i$, $\bm{\Theta}_{11}$ refers to $\bm{\Theta}$ without its $i$-th row and $i$-th column, $\bm{\theta}_{12}$ is the $i$-th row of $\bm{\Theta}$ without its $i$-th value, and $\theta_{22}$ is $\Theta_{ii}$ as illustrated below alg:spodnet). Suppose that $\bm{\Theta} \in \mathbb{ prese

Figures (9)

Figure 1: A SpodNet layer chains $p$ updates of column-row pairs and diagonals using neural networks. The matrices remain SPD at all times via Schur's condition.
Figure 2: GLAD's limitations. Smallest eigenvalue (orange), density degree (green) and relative discrepancy of the two matrices (blue) for an output $(\bm{Z}_\mathrm{out}, \bm{\Theta}_\mathrm{out})$ of GLAD. $\bm{\Theta}_{\mathrm{out}}$ and $\bm{Z}_{\mathrm{out}}$ are different, $\bm{\Theta}_\mathrm{out}$ is not sparse, and $\bm{Z}_\mathrm{out}$ is not positive-definite.
Figure 3: Training dynamics of our 3 models (on test data described in \ref{['subsec:spodnet_synthetic_data']}). Left: The outputs of our 3 models remain positive-definite. Middle: The conditioning remains stable. Right: The outputs are sparse. Overall our models produce jointly sparse + SPD outputs.
Figure 4: Learning-based (in variations of blue) vs traditional methods (in variations of red). Dotted curves indicate when one of the constraints (SPDness or sparsity) is not guaranteed. First row: Strongly sparse $\bm{\Theta}_\mathrm{true}$. Second row: Weakly sparse $\bm{\Theta}_\mathrm{true}$.
Figure 5: Comparing models in sample deficient regimes ($n \leq p$) up to large-sample regimes ($n \gg p$), evaluated in terms of NMSE and F1 score for support recovery. Dotted curves indicate when one of the constraints (SPDness or sparsity) is not guaranteed. In large dimension, GLAD's $\bm{Z}$ performs well, but is never SPD; GLAD's $\bm{\Theta}$ is SPD, but performs badly.
...and 4 more figures

Theorems & Definitions (4)

Proposition 3.1
proof
Proposition 3.2
proof

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

TL;DR

Abstract

Schur's Positive-Definite Network: Deep Learning in the SPD cone with structure

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (4)