Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

Qinghua Tao; Francesco Tonin; Alex Lambert; Yingyi Chen; Panagiotis Patrinos; Johan A. K. Suykens

Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

Qinghua Tao, Francesco Tonin, Alex Lambert, Yingyi Chen, Panagiotis Patrinos, Johan A. K. Suykens

TL;DR

This paper develops a principled framework for learning in asymmetric feature spaces by formulating a Coupled Covariances Eigenproblem (CCE) that jointly learns two sets of directions in a shared Hilbert space and reduces to the SVD of an asymmetric kernel matrix $G$, thereby connecting to KSVD while accommodating infinite-dimensional mappings. It extends the Nyström method to asymmetric kernels via adjoint-eigenfunction theory, enabling scalable, out-of-sample computation for KSVD-like embeddings. The authors provide extensive empirical validation on directed graphs, biclustering, and general data, showing that combining asymmetry with nonlinearity yields superior embeddings for downstream tasks, and demonstrate substantial speedups through the proposed asymmetric Nyström method. Overall, the work offers a covariance-based, kernel-theoretic approach to asymmetric learning with practical scalability and broad applicability to representation learning in graphs and multi-view data.

Abstract

In contrast with Mercer kernel-based approaches as used e.g., in Kernel Principal Component Analysis (KPCA), it was previously shown that Singular Value Decomposition (SVD) inherently relates to asymmetric kernels and Asymmetric Kernel Singular Value Decomposition (KSVD) has been proposed. However, the existing formulation to KSVD cannot work with infinite-dimensional feature mappings, the variational objective can be unbounded, and needs further numerical evaluation and exploration towards machine learning. In this work, i) we introduce a new asymmetric learning paradigm based on coupled covariance eigenproblem (CCE) through covariance operators, allowing infinite-dimensional feature maps. The solution to CCE is ultimately obtained from the SVD of the induced asymmetric kernel matrix, providing links to KSVD. ii) Starting from the integral equations corresponding to a pair of coupled adjoint eigenfunctions, we formalize the asymmetric Nyström method through a finite sample approximation to speed up training. iii) We provide the first empirical evaluations verifying the practical utility and benefits of KSVD and compare with methods resorting to symmetrization or linear SVD across multiple tasks.

Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

TL;DR

, thereby connecting to KSVD while accommodating infinite-dimensional mappings. It extends the Nyström method to asymmetric kernels via adjoint-eigenfunction theory, enabling scalable, out-of-sample computation for KSVD-like embeddings. The authors provide extensive empirical validation on directed graphs, biclustering, and general data, showing that combining asymmetry with nonlinearity yields superior embeddings for downstream tasks, and demonstrate substantial speedups through the proposed asymmetric Nyström method. Overall, the work offers a covariance-based, kernel-theoretic approach to asymmetric learning with practical scalability and broad applicability to representation learning in graphs and multi-view data.

Abstract

Paper Structure (43 sections, 3 theorems, 25 equations, 6 figures, 9 tables, 1 algorithm)

This paper contains 43 sections, 3 theorems, 25 equations, 6 figures, 9 tables, 1 algorithm.

Introduction
Learning in Feature Spaces with Asymmetry
Asymmetric Similarity
Coupled Covariances Eigenproblem
Notation.
Construction of the Subspaces in $\mathcal{H}$.
Projection Operators.
CCE versus 2KPCA.
Related work
Asymmetric Kernel SVD.
Symmetric Kernel Approaches with Covariances.
Nyström Method for Asymmetric Kernels
Adjoint Eigenfunctions
Nyström Approximation for the Adjoint Eigenfunctions
Nyström Approximation to Asymmetric Kernel Matrices
...and 28 more sections

Key Result

Proposition 2.2

Let $G \in \mathbb{R}^{n \times m}$ such that $g_{ij} = \frac{1}{\sqrt{nm}}\langle \phi(x_i), \psi(z_j) \rangle$. For all $B_\phi \in \mathbb{R}^{n \times r}$ and $B_\psi \in \mathbb{R}^{m \times r}$, it holds that

Figures (6)

Figure 1: Illustrative example of asymmetric similarity. In a directed graph, each node can act as the source or the target. Given the adjacency matrix $[a(v_i, v_j)]_{i,j=1}^N$, its rows relate to the outgoing edges, while the columns relate to the incoming edges. The connections between nodes are directional, s.t. $a(v_i, v_j) \neq a(v_j, v_i), \, i\neq j$.
Figure 2: Schematic of our construction. $\mathcal{X},\mathcal{Z}$ from $A$ are mapped to a possibly infinite-dimensional space $\mathcal{H}$. We propose to consider coupled scalar products $\psi(z_j)$ onto $w^\phi_l$ and $\phi(x_i)$ onto $w^\psi_l$. $\mathcal{H}$ is shown separately for clarity.
Figure 3: Overview comparison of KPCA and CCE.
Figure 4: Varying singular spectrum. Number of samples $m$ (green) to achieve a fixed tolerance and the speedup factor w.r.t. RSVD (blue) on Cora when the singular spectrum of $G$ changes (larger $\gamma$ leads to faster decay).
Figure 5: Effect of $m$. Performance on Cora at different $m$ by asymmetric Nyström. Dashed lines indicate the exact solution.
...and 1 more figures

Theorems & Definitions (8)

Definition 2.1: CCE
Proposition 2.2
Remark 2.3: Dimensionality Compatibility Matrix
Proposition 2.4
Proposition 2.5
proof
proof
proof

Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

TL;DR

Abstract

Learning in Feature Spaces via Coupled Covariances: Asymmetric Kernel SVD and Nyström method

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (8)