Subspace Kernel Learning on Tensor Sequences

Lei Wang; Xi Ding; Yongsheng Gao; Piotr Koniusz

Subspace Kernel Learning on Tensor Sequences

Lei Wang, Xi Ding, Yongsheng Gao, Piotr Koniusz

Abstract

Learning from structured multi-way data, represented as higher-order tensors, requires capturing complex interactions across tensor modes while remaining computationally efficient. We introduce Uncertainty-driven Kernel Tensor Learning (UKTL), a novel kernel framework for $M$-mode tensors that compares mode-wise subspaces derived from tensor unfoldings, enabling expressive and robust similarity measure. To handle large-scale tensor data, we propose a scalable Nyström kernel linearization with dynamically learned pivot tensors obtained via soft $k$-means clustering. A key innovation of UKTL is its uncertainty-aware subspace weighting, which adaptively down-weights unreliable mode components based on estimated confidence, improving robustness and interpretability in comparisons between input and pivot tensors. Our framework is fully end-to-end trainable and naturally incorporates both multi-way and multi-mode interactions through structured kernel compositions. Extensive evaluations on action recognition benchmarks (NTU-60, NTU-120, Kinetics-Skeleton) show that UKTL achieves state-of-the-art performance, superior generalization, and meaningful mode-wise insights. This work establishes a principled, scalable, and interpretable kernel learning paradigm for structured multi-way and multi-modal tensor sequences.

Subspace Kernel Learning on Tensor Sequences

Abstract

-mode tensors that compares mode-wise subspaces derived from tensor unfoldings, enabling expressive and robust similarity measure. To handle large-scale tensor data, we propose a scalable Nyström kernel linearization with dynamically learned pivot tensors obtained via soft

-means clustering. A key innovation of UKTL is its uncertainty-aware subspace weighting, which adaptively down-weights unreliable mode components based on estimated confidence, improving robustness and interpretability in comparisons between input and pivot tensors. Our framework is fully end-to-end trainable and naturally incorporates both multi-way and multi-mode interactions through structured kernel compositions. Extensive evaluations on action recognition benchmarks (NTU-60, NTU-120, Kinetics-Skeleton) show that UKTL achieves state-of-the-art performance, superior generalization, and meaningful mode-wise insights. This work establishes a principled, scalable, and interpretable kernel learning paradigm for structured multi-way and multi-modal tensor sequences.

Paper Structure (28 sections, 33 equations, 4 figures, 4 tables)

This paper contains 28 sections, 33 equations, 4 figures, 4 tables.

Introduction
Related Work
Preliminaries
Method
Tensor Representations of Sequences
Sum-Product Grassmann Kernel
Uncertainty-Driven Subspace Learning
Nyström Kernel Linearization
Experiment
Datasets and Setups
Comparisons with the State of the Art
Ablation Study
Conclusion
Derivative of Uncertainty-driven Kernelized Tensor Learning
Maximum Likelihood Interpretation of Mode-wise Uncertainty
...and 13 more sections

Figures (4)

Figure 1: Mode-wise factor matrices from the Tucker decomposition for the action called "draw x". Each row shows one latent factor (from 4 leading factors), and each column corresponds to one tensor mode: temporal block, body joints, 3D coordinates, and time. Structured patterns reveal interpretable, mode-specific information which motivates our approach.
Figure 2: Overview of the proposed Uncertainty-driven Kernel Tensor Learning (UKTL) pipeline for action recognition. For brevity, we use skeletons as an example. Each skeleton sequence is divided into temporal blocks $\mathbf{B}_1,\ldots,\mathbf{B}_\tau$, embedded via an MLP, and processed by a Higher-order Transformer (HoT) to obtain feature tensor $\boldsymbol{\mathcal{X}}_i$. These tensors undergo mode-$m$ matricization ($1,\ldots,M$) and SVD to extract $M$ subspaces per sample. Soft $k$-means clustering yields $C$ Nyström pivots, each represented by also $M$ subspaces. A Multi-mode SigmaNet (MSN) estimates uncertainty vectors over all subspaces, which are used to regularize kernel computations. The Nyström-approximated KTL maps inputs to compact, uncertainty-aware representations, $\tilde{\mathbf{g}}_i$ for final classification. The entire model is trained end-to-end.
Figure 3: Ablation study evaluating the effects of subspace order (on NTU-60/120), and of kernel choice, Nyström pivots, and kernel composition (on NTU-60) within the UKTL framework.
Figure 4: Visualization of Tucker decomposition in each mode of tensor representations for action draw x and action draw tick.

Subspace Kernel Learning on Tensor Sequences

Abstract

Subspace Kernel Learning on Tensor Sequences

Authors

Abstract

Table of Contents

Figures (4)