Table of Contents
Fetching ...

Robust Automatic Differentiation of Square-Root Kalman Filters via Gramian Differentials

Adrien Corenflos

Abstract

Square-root Kalman filters propagate state covariances in Cholesky-factor form for numerical stability, and are a natural target for gradient-based parameter learning in state-space models. Their core operation, triangularization of a matrix $M \in \mathbb{R}^{n \times m}$, is computed via a QR decomposition in practice, but naively differentiating through it causes two problems: the semi-orthogonal factor is non-unique when $m > n$, yielding undefined gradients; and the standard Jacobian formula involves inverses, which diverges when $M$ is rank-deficient. Both are resolved by the observation that all filter outputs relevant to learning depend on the input matrix only through the Gramian $MM^\top$, so the composite loss is smooth in $M$ even where the triangularization is not. We derive a closed-form chain-rule directly from the differential of this Gramian identity, prove it exact for the Kalman log-marginal likelihood and filtered moments, and extend it to rank-deficient inputs via a two-component decomposition: a column-space term based on the Moore--Penrose pseudoinverse, and a null-space correction for perturbations outside the column space of $M$.

Robust Automatic Differentiation of Square-Root Kalman Filters via Gramian Differentials

Abstract

Square-root Kalman filters propagate state covariances in Cholesky-factor form for numerical stability, and are a natural target for gradient-based parameter learning in state-space models. Their core operation, triangularization of a matrix , is computed via a QR decomposition in practice, but naively differentiating through it causes two problems: the semi-orthogonal factor is non-unique when , yielding undefined gradients; and the standard Jacobian formula involves inverses, which diverges when is rank-deficient. Both are resolved by the observation that all filter outputs relevant to learning depend on the input matrix only through the Gramian , so the composite loss is smooth in even where the triangularization is not. We derive a closed-form chain-rule directly from the differential of this Gramian identity, prove it exact for the Kalman log-marginal likelihood and filtered moments, and extend it to rank-deficient inputs via a two-component decomposition: a column-space term based on the Moore--Penrose pseudoinverse, and a null-space correction for perturbations outside the column space of .
Paper Structure (12 sections, 5 theorems, 15 equations, 1 figure)

This paper contains 12 sections, 5 theorems, 15 equations, 1 figure.

Key Result

Lemma 1

Let $\ell: \mathbb{R}^{n \times n}_{\mathrm{sym}} \to \mathbb{R}$ be smooth and suppose $\ell$ depends on $L = \mathcal{T}(M)$ only through the Gramian $\Sigma = LL^\top$. Then $M \mapsto \ell(\mathcal{T}(M))$ factors as the composition of the polynomial map $M \mapsto MM^\top$ with a smooth functio

Figures (1)

  • Figure 1: Log-marginal likelihood for the model in Section \ref{['sec:experiments']} as a function of $\alpha$ as well as AD-computed tangents represented by arrows on the curve. Curve and tangents agree.

Theorems & Definitions (12)

  • Definition 1: Triangularization Operator
  • Lemma 1: Smooth factorization
  • proof
  • Proposition 1: Gramian sufficiency
  • proof
  • Definition 2: Surrogate JVP
  • Proposition 2: Verification
  • proof
  • Remark 1
  • Proposition 3: Linearity
  • ...and 2 more