Table of Contents
Fetching ...

Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry

Rui Wang, Shaocheng Jin, Ziheng Chen, Xiaoqing Luo, Xiao-Jun Wu

TL;DR

Covariance matrices live on the SPD manifold $oldsymbol{{ m S}_{++}^d}$, where ill-conditioning hinders normalization in RBN. The paper introduces GBWBN, a Riemannian batch normalization framework built on the generalized Bures-Wasserstein metric $(g^{( heta) ext{-GBW}})$ with a learnable SPD matrix $oldsymbol{M}$ and a power deformation parameter $ heta$, enabling a flexible normalization geometry. A RBN layer under BW is extended via a Riemannian isometry to GBWM, with training of the bias $oldsymbol{ m G}$ and metric components using Riemannian stochastic gradient descent. Empirical results on skeleton-based action recognition (HDM05, NTU RGB+D) and EEG/SSVEP (MAMEM-SSVEP-II) demonstrate improved conditioning and accuracy over AIM-based and other RBN baselines, validating the practical impact of geometry-aware SPD normalization.

Abstract

Covariance matrices have proven highly effective across many scientific fields. Since these matrices lie within the Symmetric Positive Definite (SPD) manifold - a Riemannian space with intrinsic non-Euclidean geometry, the primary challenge in representation learning is to respect this underlying geometric structure. Drawing inspiration from the success of Euclidean deep learning, researchers have developed neural networks on the SPD manifolds for more faithful covariance embedding learning. A notable advancement in this area is the implementation of Riemannian batch normalization (RBN), which has been shown to improve the performance of SPD network models. Nonetheless, the Riemannian metric beneath the existing RBN might fail to effectively deal with the ill-conditioned SPD matrices (ICSM), undermining the effectiveness of RBN. In contrast, the Bures-Wasserstein metric (BWM) demonstrates superior performance for ill-conditioning. In addition, the recently introduced Generalized BWM (GBWM) parameterizes the vanilla BWM via an SPD matrix, allowing for a more nuanced representation of vibrant geometries of the SPD manifold. Therefore, we propose a novel RBN algorithm based on the GBW geometry, incorporating a learnable metric parameter. Moreover, the deformation of GBWM by matrix power is also introduced to further enhance the representational capacity of GBWM-based RBN. Experimental results on different datasets validate the effectiveness of our proposed method.

Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry

TL;DR

Covariance matrices live on the SPD manifold , where ill-conditioning hinders normalization in RBN. The paper introduces GBWBN, a Riemannian batch normalization framework built on the generalized Bures-Wasserstein metric with a learnable SPD matrix and a power deformation parameter , enabling a flexible normalization geometry. A RBN layer under BW is extended via a Riemannian isometry to GBWM, with training of the bias and metric components using Riemannian stochastic gradient descent. Empirical results on skeleton-based action recognition (HDM05, NTU RGB+D) and EEG/SSVEP (MAMEM-SSVEP-II) demonstrate improved conditioning and accuracy over AIM-based and other RBN baselines, validating the practical impact of geometry-aware SPD normalization.

Abstract

Covariance matrices have proven highly effective across many scientific fields. Since these matrices lie within the Symmetric Positive Definite (SPD) manifold - a Riemannian space with intrinsic non-Euclidean geometry, the primary challenge in representation learning is to respect this underlying geometric structure. Drawing inspiration from the success of Euclidean deep learning, researchers have developed neural networks on the SPD manifolds for more faithful covariance embedding learning. A notable advancement in this area is the implementation of Riemannian batch normalization (RBN), which has been shown to improve the performance of SPD network models. Nonetheless, the Riemannian metric beneath the existing RBN might fail to effectively deal with the ill-conditioned SPD matrices (ICSM), undermining the effectiveness of RBN. In contrast, the Bures-Wasserstein metric (BWM) demonstrates superior performance for ill-conditioning. In addition, the recently introduced Generalized BWM (GBWM) parameterizes the vanilla BWM via an SPD matrix, allowing for a more nuanced representation of vibrant geometries of the SPD manifold. Therefore, we propose a novel RBN algorithm based on the GBW geometry, incorporating a learnable metric parameter. Moreover, the deformation of GBWM by matrix power is also introduced to further enhance the representational capacity of GBWM-based RBN. Experimental results on different datasets validate the effectiveness of our proposed method.

Paper Structure

This paper contains 25 sections, 4 theorems, 42 equations, 9 figures, 10 tables, 1 algorithm.

Key Result

Proposition 3.1

Given $N$ SPD matrices $\{{\boldsymbol{\mathrm{X}}_{i}}\}_{i=1}^{N}$, $\boldsymbol{\mathrm{s}} \in\mathbb{R}\backslash\{0 \}$, defining $\boldsymbol{\mathrm{\psi}}_{\boldsymbol{\mathrm{s}}}(\boldsymbol{\mathrm{X}}_{i})=\mathrm{Exp}_{\boldsymbol{\mathrm{I}}_{d}}\left(\boldsymbol{\mathrm{s}}\mathrm{Lo where $\{\omega_{i}\}_{i=1}^N$ are weights satisfying ${\omega }_{i}\ge {0}$, $\sum_i {\omega}_{i}=

Figures (9)

  • Figure 1: An overview of the computation process for GBWBN.
  • Figure 2: The heatmaps of the absolute gradient responses (computed as in matt) for the S11 model across five frequency categories on the MAMEM-SSVEP-II dataset. In each heatmap, the x-axis represents time, while the y-axis signifies the EEG channels.
  • Figure 3: The spatial topomaps of the mean absolute gradient responses (computed as in matt) across time for the S11 model on the MAMEM-SSVEP-II dataset. The brain region marked in dark red corresponds to channel Oz, which shows strong gradient activation across the visual cortex for all stimulation frequencies.
  • Figure 4: Illustration of ill-conditioned $2 \times 2$ SPD matrices (left) and the output (right) of the GBWBN layer. The black dots are semi-positive matrices ($\kappa=\infty$), denoting the SPD boundary, while the interior of the cone is the SPD manifold.
  • Figure 5: Numerical comparison of the Log and Exp maps under AIM and BWM, where we select the identity matrix as the base point for simplicity. Here, $x$-axis denotes an eigenvalue and $y$-axis is the corresponding output eigenvalue.
  • ...and 4 more figures

Theorems & Definitions (8)

  • Proposition 3.1
  • proof
  • Proposition 3.2: Deformation
  • proof
  • Proposition 3.3: Locally Deformed AIM
  • proof
  • Theorem 3.4
  • proof