Tensor network square root Kalman filter for online Gaussian process regression

Clara Menzen; Manon Kok; Kim Batselier

Tensor network square root Kalman filter for online Gaussian process regression

Clara Menzen, Manon Kok, Kim Batselier

TL;DR

This work introduces a tensor-network square root Kalman filter (TNSRKF) for online Gaussian process regression in extremely high-dimensional settings. By representing the weight vector with a tensor-train (TT) and the square-root covariance with a TT-matrix (TTm), the method uses alternating linear schemes (ALS) to update the mean and a thin SVD-based QR step to truncate the covariance factor, preserving positive definiteness. The authors prove equivalence to a full-rank Kalman filter and demonstrate improved prediction accuracy and uncertainty quantification over the state-of-the-art TNKF on synthetic and real data, including a 4^{14}-parameter system identifiable on a laptop. This approach enables scalable online GP inference with robust numerical stability, and the code is made available for reproducibility.

Abstract

The state-of-the-art tensor network Kalman filter lifts the curse of dimensionality for high-dimensional recursive estimation problems. However, the required rounding operation can cause filter divergence due to the loss of positive definiteness of covariance matrices. We solve this issue by developing, for the first time, a tensor network square root Kalman filter, and apply it to high-dimensional online Gaussian process regression. In our experiments, we demonstrate that our method is equivalent to the conventional Kalman filter when choosing a full-rank tensor network. Furthermore, we apply our method to a real-life system identification problem where we estimate $4^{14}$ parameters on a standard laptop. The estimated model outperforms the state-of-the-art tensor network Kalman filter in terms of prediction accuracy and uncertainty quantification.

Tensor network square root Kalman filter for online Gaussian process regression

TL;DR

Abstract

parameters on a standard laptop. The estimated model outperforms the state-of-the-art tensor network Kalman filter in terms of prediction accuracy and uncertainty quantification.

Paper Structure (24 sections, 2 theorems, 19 equations, 10 figures, 2 tables, 2 algorithms)

This paper contains 24 sections, 2 theorems, 19 equations, 10 figures, 2 tables, 2 algorithms.

Problem Formulation
Background on tensor networks
Tensor networks
Tensor train vectors
TT matrices and tall TT matrices
Tensor-networked SRKF
Update of weight mean
Update of square root covariance factor
Predictions
Implementation
Updating $\hat{\mathbf{w}}_t$ in TN format
Implementation of $\mathbf{G}_{d,t}^\top(\hat{\mathbf{w}}_{t-1} + \mathbf{K}_t(y_t-\boldsymbol\phi_t^\top \hat{\mathbf{w}}_{t-1}))$
Initialization of $\hat{\mathbf{w}}_0$ and $\mathbf{w}_1$
Updating $\mathbf{L}_t$ in TT format
Implementation of first term of \ref{['eq:ALScov']}
...and 9 more sections

Key Result

Lemma 5

Zero-mean prior in TT format Batselier2019extended Consider a vector with all entries equal to zero. In TT format, such a vector is given by a TT in site-$d$-mixed canonical format, where the $d$th TT-core contains only zeros.

Figures (10)

Figure 1: Visual depiction of tensor diagrams for a (a) TT, (b) TTm, (c) tall TTm and (d) thin SVD.
Figure 2: Visual depiction of (a) predictive mean and (b) predictive covariance for $D=5$.
Figure 3: Visual depiction of computation of $\mathbf{G}_{d,t}^\top\hat{\mathbf{w}}_{t-1}$, resulting in three-way tensor of size $R_\mathbf{w}\times I \times R_\mathbf{w}$ (gray node). The indices are summed over from left to right, alternating between the vertical and horizontal ones.
Figure 4: Visual depiction for computing the augmented TTm-core in \ref{['eq:term1_eq']} resulting in a 4-way tensor of size $R_\mathbf{L}\times I\times 2J\times R_\mathbf{L}$ (gray node). The combined horizontal and curved indices are summed over and alternating with the horizontal indices.
Figure 5: Visual depiction for computing \ref{['eq:term2_eq2']} resulting in a 4-way tensor of size $R_\mathbf{L}\times I\times 2J\times R_\mathbf{L}$ (gray node). First, the indices in the red and blue boxes are summed over, then the indices between the red, yellow, and blue boxes, and finally, the ones between the red, green, and blue boxes.
...and 5 more figures

Theorems & Definitions (7)

Definition 1
Definition 2
Definition 3
Example 4
Lemma 5
Example 6
Lemma 7

Tensor network square root Kalman filter for online Gaussian process regression

TL;DR

Abstract

Tensor network square root Kalman filter for online Gaussian process regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (7)