Low-rank plus diagonal approximations for Riccati-like matrix differential equations

Silvère Bonnabel; Marc Lambert; Francis Bach

Low-rank plus diagonal approximations for Riccati-like matrix differential equations

Silvère Bonnabel, Marc Lambert, Francis Bach

TL;DR

This work tackles the challenge of approximating time-dependent, large-scale PSD matrices arising from matrix differential equations. It introduces low-rank plus diagonal (and isotropic) PSD manifolds and derives closed-form projections of tangent vectors onto these manifolds, achieving linear-in-d cost while preserving invertibility via Woodbury identities. The framework is specialized to Riccati-like equations and demonstrated on two fronts: a Wasserstein gradient flow for Gaussian variational inference and a high-dimensional Kalman filtering setting, yielding tractable PPCA and FA variants. The approach provides invertible, memory-efficient covariance representations that improve estimation stability and enable scalable computation in large-scale statistical and control problems.

Abstract

We consider the problem of computing tractable approximations of time-dependent d x d large positive semi-definite (PSD) matrices defined as solutions of a matrix differential equation. We propose to use "low-rank plus diagonal" PSD matrices as approximations that can be stored with a memory cost being linear in the high dimension d. To constrain the solution of the differential equation to remain in that subset, we project the derivative at all times onto the tangent space to the subset, following the methodology of dynamical low-rank approximation. We derive a closed-form formula for the projection, and show that after some manipulations it can be computed with a numerical cost being linear in d, allowing for tractable implementation. Contrary to previous approaches based on pure low-rank approximations, the addition of the diagonal term allows for our approximations to be invertible matrices, that can moreover be inverted with linear cost in d. We apply the technique to Riccati-like equations, then to two particular problems. Firstly a low-rank approximation to our recent Wasserstein gradient flow for Gaussian approximation of posterior distributions in approximate Bayesian inference, and secondly a novel low-rank approximation of the Kalman filter for high-dimensional systems. Numerical simulations illustrate the results.

Low-rank plus diagonal approximations for Riccati-like matrix differential equations

TL;DR

Abstract

Paper Structure (24 sections, 10 theorems, 67 equations, 2 figures, 1 table)

This paper contains 24 sections, 10 theorems, 67 equations, 2 figures, 1 table.

Introduction
Reminders on low-rank approximation
Geometry of $\mathrm{S^+(p,d)}$
Optimal approximations on the tangent space
Optimal low-rank plus diagonal approximation
Geometry of $\mathrm{S_{\mathrm{diag}}^+(p,d)}$ and $\mathrm{S_{\mathrm{isot}}^+(p,d)}$
Optimal approximation in the PPCA form
Optimal approximation in the FA form
Implementation
Numerically efficient formulation for the FA form
Computation cost
Application to the Riccati equation
Application to computational statistics
Wasserstein gradient flow for Gaussian variational inference
Particular case of a Gaussian target
...and 9 more sections

Key Result

Proposition 1

The orthogonal projection of a symmetric matrix $H$ onto $\mathcal{T}_Y \mathrm{S^+(p,d)}$ at $Y=URU^T$ is, in the retained form of tangent vectors tan:form:eq, where the matrices are given by: The tangent vector then writes This choice solves problem Lubich3:eq, that is, it minimizes over matrices of the form $\delta Y= \delta U RU^T+U \delta RU^T+UR\delta U^T$ with constraints tan:form:eq the

Figures (2)

Figure 1: $d=200$ with $p=8$ (left) and $p=50$ (right). Normalized distance between the covariance matrix computed from the true full-rank Riccati equation and the ones computed from the low-rank, low-rank + diagonal (FA) and low-rank + isotropic diagonal (PPCA) for the swarm example.
Figure 2: $d=200$ with $p=8$ (left) and $p=50$ (right). Norm of the error over time between the filters' state estimates and the full KF's optimal estimate, for a randomly picked initial error in a noise-free setting.

Theorems & Definitions (16)

Proposition 1: from LubichRouchon
Remark 1
Proposition 2
proof
Proposition 3
proof
Proposition 4
Lemma 1
Proposition 5: PPCA-Riccati
Proposition 6: FA-Riccati
...and 6 more

Low-rank plus diagonal approximations for Riccati-like matrix differential equations

TL;DR

Abstract

Low-rank plus diagonal approximations for Riccati-like matrix differential equations

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (16)